Gutiérrez-Serafín, Benjamín and Andreu-Perez, Javier and Pérez-Espinosa, Humberto and Paulmann, Silke and Ding, Weiping (2024) Toward assessment of human voice biomarkers of brain lesions through explainable deep learning. Biomedical Signal Processing and Control, 87B. p. 105457. DOI https://doi.org/10.1016/j.bspc.2023.105457
Gutiérrez-Serafín, Benjamín and Andreu-Perez, Javier and Pérez-Espinosa, Humberto and Paulmann, Silke and Ding, Weiping (2024) Toward assessment of human voice biomarkers of brain lesions through explainable deep learning. Biomedical Signal Processing and Control, 87B. p. 105457. DOI https://doi.org/10.1016/j.bspc.2023.105457
Gutiérrez-Serafín, Benjamín and Andreu-Perez, Javier and Pérez-Espinosa, Humberto and Paulmann, Silke and Ding, Weiping (2024) Toward assessment of human voice biomarkers of brain lesions through explainable deep learning. Biomedical Signal Processing and Control, 87B. p. 105457. DOI https://doi.org/10.1016/j.bspc.2023.105457
Abstract
Lesions in the brain resulting from traumatic injuries or strokes can evolve into speech dysfunction in undiagnosed patients. Employing ML-based tools to analyze the prosody or articulatory phonetics of human speech could be advantageous for early screening of undetected brain injuries. Additionally, explaining the model’s decision-making process can support predictions and take appropriate measures to improve patient voice quality. However, traditional ML methods relying on low-level descriptors (LLDs) may sacrifice detailed temporal dynamics and other speech characteristics. Interpreting these descriptors can also be challenging, requiring significant effort to understand feature relationships and suitable ranges. To address these limitations, this research paper introduces xDMFCCs, a method that identifies interpretive discriminatory acoustic biomarkers from a single speech utterance, providing local and global interpretations of deep learning models in speech applications. To validate this approach, it was implemented to interpret a Convolutional Neural Network (CNN) trained on Mel-frequency Cepstral Coefficients (MFCC) for the binary classification task to differentiate between patients from control vocalizations. The ConvNet achieved promising results with a 75% f-score (75% recall, 76% precision), comparable to conventional machine learning baselines. What sets xDMFCCs apart is its explanation through a 2D time-frequency representation that preserves the complete speech signal. This representation offers a more transparent explanation for differentiating between patients and healthy controls, enhancing interpretability. This advancement enables more detailed and compelling studies in speech acoustic traits of brain lesions. Furthermore, the findings have significant implications for developing low-cost and rapid diagnostics of unnoticed brain lesions.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Intelligent audio analysis, acoustic features, traumatic brain injury, explainable machine learning |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of Faculty of Science and Health > Psychology, Department of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 22 Sep 2023 09:45 |
Last Modified: | 30 Oct 2024 16:14 |
URI: | http://repository.essex.ac.uk/id/eprint/36371 |
Available files
Filename: BrainInjuryElsevier_Preprint.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0