0

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

The study focuses on Speaker Identification using Mel Spectrogram and Mel Frequency Cepstral Coefficients for feature extraction, evaluating six model architectures for accent and gender accuracy in the AB-1 Corpus dataset.

Year
2024
Venue
arXiv 2024
Authors
1
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2408.06804ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.

Authors

1