Chen, Song and Kirton-Wingate, Jasper and Doctor, Faiyaz and Arshad, Usama and Dashtipour, Kia and Gogate, Mandar and Halim, Zahid and Al-Dubai, Ahmed and Arslan, Tughrul and Hussain, Amir (2024) Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning. IEEE Transactions on Fuzzy Systems, 32 (10). pp. 5400-5412. DOI https://doi.org/10.1109/tfuzz.2024.3435050 (In Press)
Chen, Song and Kirton-Wingate, Jasper and Doctor, Faiyaz and Arshad, Usama and Dashtipour, Kia and Gogate, Mandar and Halim, Zahid and Al-Dubai, Ahmed and Arslan, Tughrul and Hussain, Amir (2024) Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning. IEEE Transactions on Fuzzy Systems, 32 (10). pp. 5400-5412. DOI https://doi.org/10.1109/tfuzz.2024.3435050 (In Press)
Chen, Song and Kirton-Wingate, Jasper and Doctor, Faiyaz and Arshad, Usama and Dashtipour, Kia and Gogate, Mandar and Halim, Zahid and Al-Dubai, Ahmed and Arslan, Tughrul and Hussain, Amir (2024) Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning. IEEE Transactions on Fuzzy Systems, 32 (10). pp. 5400-5412. DOI https://doi.org/10.1109/tfuzz.2024.3435050 (In Press)
Abstract
It is estimated that by 2050 approximately one in ten individuals globally will experience disabling hearing impairment. In the presence of everyday reverberant noise, a substantial proportion of individual users encounter challenges in speech comprehension. This study introduces a novel application of neuro-fuzzy modelling that synergizes and fuses audio-visual speech enhancement (AV SE) with an initial user preference learning based framework. Specifically, our approach uniquely integrates multimodal AV speech data with innovative SE methods and fuzzy inferencing techniques. This integration is further enriched by incorporating a user-preference learning model that adapts to environmental and user-specific contexts, including signal-to-noise ratios, sound power, and the quality of visual information. The proposed framework facilitates the incorporation of clinical measures such as user cognitive load (or listening effort) with real-world uncertainty to steer the system outputs. We employ an adaptive fuzzy neural network to derive the most effective Sugeno fuzzy inference model, employing particle swarm optimization to ensure optimal SE by considering sound power, ambient noise levels, and visual quality. Experimental results utilise our new benchmark AV multi-talker Challenge dataset to demonstrate the superiority of our user preference-informed, context-aware AV SE approach in enhancing speech intelligibility and quality in challenging noisy conditions, marking a significant advancement over conventional methods while reducing energy consumption. The conclusion supports the ecological scalability of our approach and its potential for real-world applications, setting a new benchmark in AV SE research, paving the way for future assistive hearing and communication technologies.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | deep neural networks; Fuzzy inference; preference learning; speech enhancement |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 29 Jul 2024 09:06 |
Last Modified: | 13 Nov 2024 18:25 |
URI: | http://repository.essex.ac.uk/id/eprint/38853 |
Available files
Filename: Context-Aware2507_Final_5.pdf
Licence: Creative Commons: Attribution 4.0