TY - JOUR
T1 - Preliminary Study on Image-Finding Generation and Classification of Lung Nodules in Chest CT Images Using Vision–Language Models
AU - Nagao, Maiko
AU - Teramoto, Atsushi
AU - Urata, Kaito
AU - Imaizumi, Kazuyoshi
AU - Kondo, Masashi
AU - Fujita, Hiroshi
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/11
Y1 - 2025/11
N2 - In the diagnosis of lung cancer, imaging findings of lung nodules are essential for benign and malignant classifications. Although numerous studies have investigated the classification of lung nodules, no method has been proposed for obtaining detailed imaging findings. This study aimed to develop a novel method for generating image findings and classifying benign and malignant nodules in chest computed tomography (CT) images using vision–language models. In this study, we collected chest CT images of 77 patients diagnosed with either benign or malignant tumors at Fujita Health University Hospital. For these images, we cropped the regions of interest around the nodules, and a pulmonologist provided the corresponding image findings. We used vision–language models for image captioning to generate image findings. The findings generated by these two models were grammatically correct, with no deviations in notation, as expected from the image findings. Moreover, the descriptions of benign and malignant characteristics were accurately obtained. The bootstrapping language–image pretraining (BLIP) base model achieved an accuracy of 79.2% in classifying nodules, and the bilingual evaluation understudy-4 score for agreement with physician findings was 0.561. These results suggest that the proposed method may be effective for classifying and generating lung nodule findings.
AB - In the diagnosis of lung cancer, imaging findings of lung nodules are essential for benign and malignant classifications. Although numerous studies have investigated the classification of lung nodules, no method has been proposed for obtaining detailed imaging findings. This study aimed to develop a novel method for generating image findings and classifying benign and malignant nodules in chest computed tomography (CT) images using vision–language models. In this study, we collected chest CT images of 77 patients diagnosed with either benign or malignant tumors at Fujita Health University Hospital. For these images, we cropped the regions of interest around the nodules, and a pulmonologist provided the corresponding image findings. We used vision–language models for image captioning to generate image findings. The findings generated by these two models were grammatically correct, with no deviations in notation, as expected from the image findings. Moreover, the descriptions of benign and malignant characteristics were accurately obtained. The bootstrapping language–image pretraining (BLIP) base model achieved an accuracy of 79.2% in classifying nodules, and the bilingual evaluation understudy-4 score for agreement with physician findings was 0.561. These results suggest that the proposed method may be effective for classifying and generating lung nodule findings.
KW - deep learning
KW - image classification
KW - image-to-text
KW - lung nodule
KW - text generation
KW - vision–language models
UR - https://www.scopus.com/pages/publications/105023662008
UR - https://www.scopus.com/pages/publications/105023662008#tab=citedBy
U2 - 10.3390/computers14110489
DO - 10.3390/computers14110489
M3 - Article
AN - SCOPUS:105023662008
SN - 2073-431X
VL - 14
JO - Computers
JF - Computers
IS - 11
M1 - 489
ER -