Shah, Siddhant Bikram and Garg, Shubham and Kouis, Nikolaos and Bourazeri, Aikaterini (2026) Decoding emotional nuances: A multimodal approach to detecting depression through audio, video, and text. Information Fusion, 136. p. 104521. DOI https://doi.org/10.1016/j.inffus.2026.104521
Shah, Siddhant Bikram and Garg, Shubham and Kouis, Nikolaos and Bourazeri, Aikaterini (2026) Decoding emotional nuances: A multimodal approach to detecting depression through audio, video, and text. Information Fusion, 136. p. 104521. DOI https://doi.org/10.1016/j.inffus.2026.104521
Shah, Siddhant Bikram and Garg, Shubham and Kouis, Nikolaos and Bourazeri, Aikaterini (2026) Decoding emotional nuances: A multimodal approach to detecting depression through audio, video, and text. Information Fusion, 136. p. 104521. DOI https://doi.org/10.1016/j.inffus.2026.104521
Abstract
Early detection of depression is crucial to prevent serious consequences, such as chronic fatigue, substance abuse, and worsening mental health. Traditional diagnostic methods often rely on self-reported questionnaires, which can be influenced by a patient’s willingness to disclose information, or on unimodal approaches that may not capture the full range of depressive symptoms. To address these limitations, we present LUNA (Listen, Understand, Nurture, Advise), a unified multimodal application-based framework designed to emulate real-world mental health assessments by integrating video, audio, and text inputs. LUNA employs individual modules for each modality, combining their results to provide a comprehensive analysis of the user’s mental state. Our findings show that each module can independently and effectively screen for depression, and their combined scores yield a more comprehensive and accurate regression score based on the PHQ-8 scale for each user session. Benchmarking against state-of-the-art depression detection models using the DAIC-WOZ dataset demonstrates that LUNA performs comparably to or better than existing validated models. To ensure privacy, no user data is stored post-assessment. Furthermore, the system features an interactive avatar to enhance user engagement and comfort. LUNA represents a significant advancement in the early detection of depression by providing a robust, privacy-conscious and user-friendly diagnostic tool.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Deep learning; Depression detection; Mental health assessment; Multimodal analysis; PHQ-8 |
| Subjects: | Z Bibliography. Library Science. Information Resources > ZR Rights Retention |
| Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 12 Jun 2026 14:35 |
| Last Modified: | 12 Jun 2026 14:37 |
| URI: | http://repository.essex.ac.uk/id/eprint/43385 |
Available files
Filename: Decoding Emotional Nuance - Information Fusion.pdf
Licence: Creative Commons: Attribution 4.0