Signal and Information Processing

Automatic Speaker Recognition for Courtroom Based on Adaptive Within-Source-Variance Control

Expand
  • 1. Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences,
    Beijing 100190, China
    2. Department of Forensic Science and Technology, China Criminal Police University,
    Shenyang 110854, China

Received date: 2012-07-24

  Revised date: 2014-09-10

  Online published: 2014-09-10

Abstract

 This paper proposes a method to transfer the scores generated from a speaker recognition system to
likelihood ratios (LR) for evaluating the strength of forensic voice evidence. A robust LR estimation algorithm
using adaptive within-source-variance control is developed to accurately estimate a model of the suspect. The
algorithm adaptively combines information of reference speakers and that of the suspect to model the withinsource-
variability of the suspect. Compared with a baseline recognition system, the system using the proposed
algorithm has better discrimination capability and reliability, and the magnitude of evidence strength is also
improved.  

Cite this article

WANG Hua-peng1,2, YANG Jun1, WU Ming1, XU Yong1 . Automatic Speaker Recognition for Courtroom Based on Adaptive Within-Source-Variance Control[J]. Journal of Applied Sciences, 2014 , 32(6) : 582 -587 . DOI: 10.3969/j.issn.0255-8297.2014.06.006

References

[1] MORRISON Geoffrey Stewart, ZHANG C L, ROSE Philip. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system [J]. Forensic Science International , 2011, 208: 59-65.

[2] ZHANG C L, MORRISON G S, ENZINGER E, OCHOA F. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison - female voices [J]. Speech Communication, 2013, 55(6): 796-813.

[3] MORRISON G S. A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM) [J]. Speech Communication, 2011, 53 (2): 242-256.

[4] WANG Huapeng, YANG Jun, XU Yong. Forensic speaker recognition in likelihood ratio framework [J]. Journal of Data Acquisition & Processing, 2013, 28(2): 239-43.

[5] CASTRO Daniel Ramos. Forensic evaluation of the evidence using automatic speaker recognition systems [D]. Universidad Autonoma de Madrid, November,  2007.  

[6] MORRISON G S. Tutorial on logistic regression calibration and fusion: converting a score to a likelihood ratio [J]. Australian Journal of Forensic Sciences, 2013, 45(2): 173-197.

[7] POH N, KITTLER J. On the use of log-likelihood ratio based model-speci?c score normalisation in biometric authentication [C]//LNCS 4542, IEEE/IAPR Proc. Int’l Conf. Biometrics (ICB’07) , 2007: 614-624.

[8] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000: 19-41.

[9]  BOTTI F, ALEXANDER A, DRYGAJLO A. An interpretation framwork for the evaluation of evidence in forensic automatic speaker recognition with limited suspect data[C]//Proc. of Odyssey, 2004: 63-68.

[10] Kinoshita Y, Osanai T. Within speaker variation in diphthongal dynamics: What can we compare? [C]//Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand, Canberra, Australia: Australasian Speech Science & Technology Association, 2006: 112-117.

[11] ROSE P. Technical forensic speaker recognition: evaluation, types and testing of evidence[J]. Computer Speech and Language, 2006: 159-191.

[12] BRÜMMER N, du PREEZ J. Application independent evaluation of speaker detection [J]. Computer Speech and Language, 2006: 230-275.

[13] ROSE P. Accounting for correlation in linguistic-acoustic likelihood ratio-based forensic speaker discrimination [C]//Proc. IEEE Odyssey Speaker and Language Recognition Workshop, 2006: 1-8.
Outlines

/