Preliminary Study on Image-Finding Generation and Classification of Lung Nodules in Chest CT Images Using Vision–Language Models

Research output: Contribution to journalArticlepeer-review

Abstract

In the diagnosis of lung cancer, imaging findings of lung nodules are essential for benign and malignant classifications. Although numerous studies have investigated the classification of lung nodules, no method has been proposed for obtaining detailed imaging findings. This study aimed to develop a novel method for generating image findings and classifying benign and malignant nodules in chest computed tomography (CT) images using vision–language models. In this study, we collected chest CT images of 77 patients diagnosed with either benign or malignant tumors at Fujita Health University Hospital. For these images, we cropped the regions of interest around the nodules, and a pulmonologist provided the corresponding image findings. We used vision–language models for image captioning to generate image findings. The findings generated by these two models were grammatically correct, with no deviations in notation, as expected from the image findings. Moreover, the descriptions of benign and malignant characteristics were accurately obtained. The bootstrapping language–image pretraining (BLIP) base model achieved an accuracy of 79.2% in classifying nodules, and the bilingual evaluation understudy-4 score for agreement with physician findings was 0.561. These results suggest that the proposed method may be effective for classifying and generating lung nodule findings.

Original languageEnglish
Article number489
JournalComputers
Volume14
Issue number11
DOIs
Publication statusPublished - 11-2025

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Human-Computer Interaction
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Preliminary Study on Image-Finding Generation and Classification of Lung Nodules in Chest CT Images Using Vision–Language Models'. Together they form a unique fingerprint.

Cite this