Mitnala, Vijaya and Reed, Martin and Bicknell, John and Chakraborty, Joyraj (2025) Intelligent speech handover for smart speakers through deep learning: a custom loss function approach. IEEE Transactions on Consumer Electronics. DOI https://doi.org/10.1109/TCE.2025.3549653
Mitnala, Vijaya and Reed, Martin and Bicknell, John and Chakraborty, Joyraj (2025) Intelligent speech handover for smart speakers through deep learning: a custom loss function approach. IEEE Transactions on Consumer Electronics. DOI https://doi.org/10.1109/TCE.2025.3549653
Mitnala, Vijaya and Reed, Martin and Bicknell, John and Chakraborty, Joyraj (2025) Intelligent speech handover for smart speakers through deep learning: a custom loss function approach. IEEE Transactions on Consumer Electronics. DOI https://doi.org/10.1109/TCE.2025.3549653
Abstract
The consistent growth of the smart speaker market has established far-field speech communication as an alternative to traditional handsets. When multiple smart speakers are used, a mechanism for seamless handover is needed, which is not currently supported. This paper presents two novel contributions that, together, enable seamless handover: using speech signals to select a suitable smart speaker through machine learning; and, reduction in media disruption during handover by local modifications to the session initiation protocol (SIP). The proposed solution uses prediction based on a one-dimensional convolutional neural network (1DCNN) and a custom loss function. A comprehensive evaluation with multiple datasets incorporating different types of audio signals, movement loci, and, varying room scenarios demonstrates the effectiveness of the suggested method in predicting the most appropriate smart speaker. Our proposal is shown to be highly effective when compared against a previously proposed predictor or using a standard 1DCNN loss function and operates with a low computational cost, suitable for consumer smart speakers.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | convolution neural networks; Machine learning; Session initiation protocol; Smart speakers; speech processing |
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 04 Apr 2025 14:03 |
Last Modified: | 04 Apr 2025 14:04 |
URI: | http://repository.essex.ac.uk/id/eprint/40474 |
Available files
Filename: Seamless_Speech_Handover.pdf