Multimodal depression detection using Multistream Mood Insight Encoder (MMIE)

  • Neda Firoz, Tomsk State University (Tomsk, Russia)
  • Olga G. Berestneva, Tomsk Polytechnic University (Tomsk, Russia)
  • Sergey V. Aksenov, Tomsk Polytechnic University (Tomsk, Russia)

The global surge in the prevalence of depression, characterized by persistent sadness, disinterest, and decreased functioning, highlights the shortcomings of prevailing diagnostic and treatment paradigms. This underscores the urgent need for enhanced interventions, given the inherent limitations of traditional approaches to diagnosing depression. Recent advances in artificial intelligence applications have sparked growing interest in the development of automated depression diagnostic systems among emotion computing experts. The emergence of large-scale language models, such as BERT and its derivatives, for text-based depression detection demonstrates the need for multimodal approaches that integrate text and audio data to achieve more accurate diagnosis. Here, we explore the capabilities of existing large-scale language models and present a proposed multi-stream model, the Multi-Stream Mood Insight Encoder (MMIE). MMIE is designed to seamlessly utilize integrated text and audio data streams with processing capabilities via the Reformer encoder. As part of this concept, linguistic features such as absolutist words and first-person pronouns were incorporated into the Reformer encoder. This holistic approach facilitated a comprehensive analysis of a person's mood and emotional state. Experiments demonstrated that the ClinicalBERT language model outperformed the proposed binary depression classification model. Subsequently, the sigmoid values of the Reformer model were used to diagnose depression. Using the proposed model, experiments were conducted on the DAIC-WOZ dataset. The results showed significant improvements, demonstrating an F1 of 0.9538 for classification, an MAE of 3.42, and an RMSE of 4.64 for regression compared to state-of-the-art methods. These results demonstrate the effectiveness of the proposed model in facilitating the diagnosis of depression.

audio, clinical analysis, depression detection, LLMs, Reformer, MMIE

2025-12-01

Copyright (c) 2025 Information and mathematical technologies in science and management
Back