Ullah, Rahmat and Asghar, Ikram and Malik, Hassan and Evans, Gareth and Ahmad, Jawad and Roberts, Dorothy Anne (2025) Enhancing Forensic Audio Transcription with Neural Network-Based Speaker Diarization and Gender Classification. In: 2024 International Conference on Engineering and Emerging Technologies (ICEET), 2024-12-27 - 2024-12-28, Dubai, UAE.
Ullah, Rahmat and Asghar, Ikram and Malik, Hassan and Evans, Gareth and Ahmad, Jawad and Roberts, Dorothy Anne (2025) Enhancing Forensic Audio Transcription with Neural Network-Based Speaker Diarization and Gender Classification. In: 2024 International Conference on Engineering and Emerging Technologies (ICEET), 2024-12-27 - 2024-12-28, Dubai, UAE.
Ullah, Rahmat and Asghar, Ikram and Malik, Hassan and Evans, Gareth and Ahmad, Jawad and Roberts, Dorothy Anne (2025) Enhancing Forensic Audio Transcription with Neural Network-Based Speaker Diarization and Gender Classification. In: 2024 International Conference on Engineering and Emerging Technologies (ICEET), 2024-12-27 - 2024-12-28, Dubai, UAE.
Abstract
Forensic audio transcription is often compromised by low-quality recordings, where indistinct speech can hinder the accuracy of conventional Automatic Speech Recognition (ASR) systems. This study addresses this limitation by developing a machine learning-based approach to improve speaker diarization, a process critical for distinguishing between speakers in sensitive audio data. Previous research highlights the inadequacy of traditional ASR in forensic settings, particularly where audio quality is poor and speaker overlap is common. This paper presents a neural network specifically designed for gender classification, using 20 key acoustic features extracted from real forensic audio data. The model architecture includes input, hidden, and output layers tailored to differentiate male and female voices, with dropout regularization to prevent overfitting and hyperparameter optimization ensuring robust generalization across test data. The neural network achieved an average recall of 86.81%, F1 score of 85.67%, precision of 87.95%, and accuracy of 86.83% across varied audio conditions. This model significantly improves transcription accuracy, reducing errors in legal contexts and supporting judicial processes with more reliable, interpretable evidence from sensitive audio data.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | Forensic linguistics; speech diarization; Speech transcription; Automatic speech recognition; ML |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 22 Apr 2025 14:16 |
Last Modified: | 22 Apr 2025 14:21 |
URI: | http://repository.essex.ac.uk/id/eprint/40534 |
Available files
Filename: Enhancing_Forensic_Audio_Transcription_with_Neural_Network-Based_Speaker_Diarization_and_Gender_Classification.pdf