TY - JOUR
T1 - An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection
AU - Teng, Shiyu
AU - Liu, Jiaqing
AU - Huang, Yue
AU - Chai, Shurong
AU - Tateyama, Tomoko
AU - Huang, Xinyin
AU - Lin, Lanfen
AU - Chen, Yen Wei
N1 - Publisher Copyright:
Copyright © 2024 The Institute of Electronics, Information and Communication Engineers.
PY - 2024/3
Y1 - 2024/3
N2 - Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.
AB - Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.
KW - audiovisual
KW - deep learning
KW - depression detection
KW - emotion fusion
KW - stimuli
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85187142105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187142105&partnerID=8YFLogxK
U2 - 10.1587/transinf.2023HCP0006
DO - 10.1587/transinf.2023HCP0006
M3 - Article
AN - SCOPUS:85187142105
SN - 0916-8532
VL - E107.D
SP - 342
EP - 353
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 3
ER -