Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Ali, Zulfiqar and Talha, Muhammad (2018) Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments. IEEE Access, 6. pp. 15494-15504. DOI https://doi.org/10.1109/access.2018.2805845

Abstract

An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.

Item Metadata

Item Type:	Article
Uncontrolled Keywords:	Voiced and unvoiced segmentation; fractal dimension; Katz algorithm; TIMIT database; KSU speech database
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	09 Apr 2020 11:20
Last Modified:	16 Aug 2025 03:51
URI:	http://repository.essex.ac.uk/id/eprint/27213

Available files

Published Version

Filename: Unsupervised Voice Activity Detection.pdf

Download

Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Abstract

Item Metadata

Share and export

Available files

Published Version

Statistics

Altmetrics

Downloads