Sae Jong, Nida and Garcia Seco De Herrera, Alba and Phukpattaranont, Pornchai (2021) Multimodal Data Fusion of Electromyography and Acoustic Signals for Thai Syllable Recognition. IEEE Journal of Biomedical and Health Informatics, Early (6). pp. 1997-2006. DOI https://doi.org/10.1109/JBHI.2020.3034158
Sae Jong, Nida and Garcia Seco De Herrera, Alba and Phukpattaranont, Pornchai (2021) Multimodal Data Fusion of Electromyography and Acoustic Signals for Thai Syllable Recognition. IEEE Journal of Biomedical and Health Informatics, Early (6). pp. 1997-2006. DOI https://doi.org/10.1109/JBHI.2020.3034158
Sae Jong, Nida and Garcia Seco De Herrera, Alba and Phukpattaranont, Pornchai (2021) Multimodal Data Fusion of Electromyography and Acoustic Signals for Thai Syllable Recognition. IEEE Journal of Biomedical and Health Informatics, Early (6). pp. 1997-2006. DOI https://doi.org/10.1109/JBHI.2020.3034158
Abstract
Speech disorders such as dysarthria are common and frequent after suffering a stroke. Speech rehabilitation performed by a speech-language pathologist is needed to improve and recover. However, in Thailand, there is a shortage of speech-language pathologists. In this paper, we present a syllable recognition system, which can be deployable in a speech rehabilitation system to provide support to the limited speech-language pathologists available. The proposed system is based on a multimodal fusion of acoustic signal and surface electromyography (sEMG) collected from facial muscles. Multimodal data fusion is studied to improve signal collection under noisy situations while reducing the number of electrodes needed. The signals are simultaneously collected while articulating 12 Thai syllables designed for rehabilitation exercises. Several features are extracted from sEMG signals and five channels are studied. The best combination of features and channels is chosen to be fused with the mel-frequency cepstral coefficients extracted from the acoustic signal. The feature vector from each signal source is projected by spectral regression extreme learning machine and concatenated. Data from seven healthy subjects were collected for evaluation purposes. Results show that the multimodal fusion outperforms the use of a single signal source achieving up to 98% of accuracy. In other words, an accuracy improvement up to 5% can be achieved when using the proposed multimodal fusion. Moreover, its low standard deviations in classification accuracy compared to those from the unimodal fusion indicate the improvement in the robustness of the syllable recognition.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Acoustic signal; electromyography; feature-level fusion; multimodal fusion; speech recognition |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 12 Jan 2021 10:11 |
Last Modified: | 30 Oct 2024 19:33 |
URI: | http://repository.essex.ac.uk/id/eprint/29499 |
Available files
Filename: Nida2020.pdf