Shah, Siddhant Bikram and Garg, Shubham and Bourazeri, Aikaterini (2023) Emotion Recognition in Speech by Multimodal Analysis of Audio and Text. In: 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2023-01-19 - 2023-01-20, Noida, India.
Shah, Siddhant Bikram and Garg, Shubham and Bourazeri, Aikaterini (2023) Emotion Recognition in Speech by Multimodal Analysis of Audio and Text. In: 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2023-01-19 - 2023-01-20, Noida, India.
Shah, Siddhant Bikram and Garg, Shubham and Bourazeri, Aikaterini (2023) Emotion Recognition in Speech by Multimodal Analysis of Audio and Text. In: 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2023-01-19 - 2023-01-20, Noida, India.
Abstract
Emotion recognition remains a very challenging task in research because of its sensitive and multifaceted nature. Recently, emotion recognition has garnered a lot of attention owing to its significance in psychology, human-computer interaction, and healthcare, where people's facial expressions, voice qualities, and spoken words are used to better understand it. While emotion recognition holds the power to facilitate various health problems, the main challenge emotion recognition systems face is to accurately identify hidden nuances in expressions and thus, the underlying emotions conveyed by them. The true emotions of a person may remain concealed or not properly identified when only one mode of input is analyzed, therefore, multimodal streams of inputs are used to provide a more holistic view of a person's emotions. In this paper, a novel framework that fuses the results of two uni-modal methods of emotion recognition, audio, and text, to develop a robust and versatile emotion recognition system is proposed. The results show that signal processing and language processing can be utilized to reliably detect emotion from audio and text, with an accuracy of 96% and 94.1% respectively. Further, the approach presented in this paper can be used as a depression detection and monitoring tool to further enable mental healthcare professionals accurately detect symptoms of depression.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 17 Oct 2024 16:59 |
Last Modified: | 17 Oct 2024 17:03 |
URI: | http://repository.essex.ac.uk/id/eprint/35010 |
Available files
Filename: Emotion_Recognition_in_Speech_by_Multimodal_Analysis_of_Audio_and_Text.pdf