This paper proposes a method to transfer the scores generated from a speaker recognition system to
likelihood ratios (LR) for evaluating the strength of forensic voice evidence. A robust LR estimation algorithm
using adaptive within-source-variance control is developed to accurately estimate a model of the suspect. The
algorithm adaptively combines information of reference speakers and that of the suspect to model the withinsource-
variability of the suspect. Compared with a baseline recognition system, the system using the proposed
algorithm has better discrimination capability and reliability, and the magnitude of evidence strength is also
improved.
WANG Hua-peng1,2, YANG Jun1, WU Ming1, XU Yong1
. Automatic Speaker Recognition for Courtroom Based on Adaptive Within-Source-Variance Control[J]. Journal of Applied Sciences, 2014
, 32(6)
: 582
-587
.
DOI: 10.3969/j.issn.0255-8297.2014.06.006
[1] MORRISON Geoffrey Stewart, ZHANG C L, ROSE Philip. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system [J]. Forensic Science International , 2011, 208: 59-65.
[2] ZHANG C L, MORRISON G S, ENZINGER E, OCHOA F. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison - female voices [J]. Speech Communication, 2013, 55(6): 796-813.
[3] MORRISON G S. A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM) [J]. Speech Communication, 2011, 53 (2): 242-256.
[4] WANG Huapeng, YANG Jun, XU Yong. Forensic speaker recognition in likelihood ratio framework [J]. Journal of Data Acquisition & Processing, 2013, 28(2): 239-43.
[5] CASTRO Daniel Ramos. Forensic evaluation of the evidence using automatic speaker recognition systems [D]. Universidad Autonoma de Madrid, November, 2007.
[6] MORRISON G S. Tutorial on logistic regression calibration and fusion: converting a score to a likelihood ratio [J]. Australian Journal of Forensic Sciences, 2013, 45(2): 173-197.
[7] POH N, KITTLER J. On the use of log-likelihood ratio based model-speci?c score normalisation in biometric authentication [C]//LNCS 4542, IEEE/IAPR Proc. Int’l Conf. Biometrics (ICB’07) , 2007: 614-624.
[8] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000: 19-41.
[9] BOTTI F, ALEXANDER A, DRYGAJLO A. An interpretation framwork for the evaluation of evidence in forensic automatic speaker recognition with limited suspect data[C]//Proc. of Odyssey, 2004: 63-68.
[10] Kinoshita Y, Osanai T. Within speaker variation in diphthongal dynamics: What can we compare? [C]//Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand, Canberra, Australia: Australasian Speech Science & Technology Association, 2006: 112-117.
[11] ROSE P. Technical forensic speaker recognition: evaluation, types and testing of evidence[J]. Computer Speech and Language, 2006: 159-191.
[12] BRÜMMER N, du PREEZ J. Application independent evaluation of speaker detection [J]. Computer Speech and Language, 2006: 230-275.
[13] ROSE P. Accounting for correlation in linguistic-acoustic likelihood ratio-based forensic speaker discrimination [C]//Proc. IEEE Odyssey Speaker and Language Recognition Workshop, 2006: 1-8.