Integrating text and medical images for segmentation using interpretable graph neural network

  • Shurong Chai
  • , Rahul Kumar Jain
  • , Shaocong Mo
  • , Jiaqing Liu
  • , Shiyu Teng
  • , Tomoko Tateyama
  • , Lanfen Lin
  • , Yen Wei Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Medical image segmentation plays an important role in computer-aided diagnosis and treatment planning. Recent approaches utilize Convolutional Neural Network (CNN) and Transformer-based architectures for decision-making. However, these frameworks often necessitate high computational costs and further improvements are required for practical applications. In this study, we explore the use of Graph Neural Network (GNN)-based segmentation framework that incorporates multimodal information. We propose a novel text-guided framework that integrates text and image modalities to effectively segment anatomical structures. Unlike existing CNN- or Transformer-based approaches, our method introduces a lightweight graph–hypergraph fusion mechanism that jointly models text–image relationships, offering both interpretability and efficiency. The proposed lightweight framework reduces computational cost while improving segmentation performance. The framework also improves interpretability by highlighting key text–image interactions. Experimental results on the QaTa-COV19 and MosMedData+ datasets achieve Dice scores of 90.7% and 77.4%, respectively, with only 5.8 GFLOPs, demonstrating improved accuracy while significantly reducing computational cost, ensuring practical applicability. The code is released on GitHub: https://github.com/11yxk/MultimodalGNN.

Original languageEnglish
Article number109404
JournalBiomedical Signal Processing and Control
Volume115
DOIs
Publication statusPublished - 15-04-2026
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'Integrating text and medical images for segmentation using interpretable graph neural network'. Together they form a unique fingerprint.

Cite this