TY - JOUR
T1 - Integrating text and medical images for segmentation using interpretable graph neural network
AU - Chai, Shurong
AU - Jain, Rahul Kumar
AU - Mo, Shaocong
AU - Liu, Jiaqing
AU - Teng, Shiyu
AU - Tateyama, Tomoko
AU - Lin, Lanfen
AU - Chen, Yen Wei
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/4/15
Y1 - 2026/4/15
N2 - Medical image segmentation plays an important role in computer-aided diagnosis and treatment planning. Recent approaches utilize Convolutional Neural Network (CNN) and Transformer-based architectures for decision-making. However, these frameworks often necessitate high computational costs and further improvements are required for practical applications. In this study, we explore the use of Graph Neural Network (GNN)-based segmentation framework that incorporates multimodal information. We propose a novel text-guided framework that integrates text and image modalities to effectively segment anatomical structures. Unlike existing CNN- or Transformer-based approaches, our method introduces a lightweight graph–hypergraph fusion mechanism that jointly models text–image relationships, offering both interpretability and efficiency. The proposed lightweight framework reduces computational cost while improving segmentation performance. The framework also improves interpretability by highlighting key text–image interactions. Experimental results on the QaTa-COV19 and MosMedData+ datasets achieve Dice scores of 90.7% and 77.4%, respectively, with only 5.8 GFLOPs, demonstrating improved accuracy while significantly reducing computational cost, ensuring practical applicability. The code is released on GitHub: https://github.com/11yxk/MultimodalGNN.
AB - Medical image segmentation plays an important role in computer-aided diagnosis and treatment planning. Recent approaches utilize Convolutional Neural Network (CNN) and Transformer-based architectures for decision-making. However, these frameworks often necessitate high computational costs and further improvements are required for practical applications. In this study, we explore the use of Graph Neural Network (GNN)-based segmentation framework that incorporates multimodal information. We propose a novel text-guided framework that integrates text and image modalities to effectively segment anatomical structures. Unlike existing CNN- or Transformer-based approaches, our method introduces a lightweight graph–hypergraph fusion mechanism that jointly models text–image relationships, offering both interpretability and efficiency. The proposed lightweight framework reduces computational cost while improving segmentation performance. The framework also improves interpretability by highlighting key text–image interactions. Experimental results on the QaTa-COV19 and MosMedData+ datasets achieve Dice scores of 90.7% and 77.4%, respectively, with only 5.8 GFLOPs, demonstrating improved accuracy while significantly reducing computational cost, ensuring practical applicability. The code is released on GitHub: https://github.com/11yxk/MultimodalGNN.
KW - Graph neural network
KW - Hypergraph neural network
KW - Medical image segmentation
KW - Vision-language model
UR - https://www.scopus.com/pages/publications/105024541256
UR - https://www.scopus.com/pages/publications/105024541256#tab=citedBy
U2 - 10.1016/j.bspc.2025.109404
DO - 10.1016/j.bspc.2025.109404
M3 - Article
AN - SCOPUS:105024541256
SN - 1746-8094
VL - 115
JO - Biomedical Signal Processing and Control
JF - Biomedical Signal Processing and Control
M1 - 109404
ER -