信号与信息处理

基于自适应同源方差控制的法庭自动说话人识别

展开
  • 1. 中国科学院噪声与振动重点实验室,北京100190
    2. 中国刑事警察学院刑事科学技术系,沈阳110854
王华朋,博士,副教授,研究方向:法庭说话人识别和法庭证据强度评估,E-mail: huapeng.wang@gmail.com

收稿日期: 2012-07-24

  修回日期: 2014-09-10

  网络出版日期: 2014-09-10

基金资助

国家自然科学基金(No.11004217,No.11074279)资助

Automatic Speaker Recognition for Courtroom Based on Adaptive Within-Source-Variance Control

Expand
  • 1. Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences,
    Beijing 100190, China
    2. Department of Forensic Science and Technology, China Criminal Police University,
    Shenyang 110854, China

Received date: 2012-07-24

  Revised date: 2014-09-10

  Online published: 2014-09-10

摘要

提出了自动说话人识别系统得分到法庭证据强度量化值似然比的转换方法. 为了更准确地评估嫌疑人的
统计模型,提出了自适应同源方差控制算法,该算法能自适应地融合来自参考人群和嫌疑人的同源语音得分模型
信息,降低了对嫌疑人数据量大小的需求. 与基本识别系统相比的测试结果表明,使用该算法的识别系统不但具有
更优良的识别性能和可靠性,而且提高了语音证据对判别结论的支持强度.

本文引用格式

王华朋1,2, 杨军1, 吴鸣1, 许勇1 . 基于自适应同源方差控制的法庭自动说话人识别[J]. 应用科学学报, 2014 , 32(6) : 582 -587 . DOI: 10.3969/j.issn.0255-8297.2014.06.006

Abstract

 This paper proposes a method to transfer the scores generated from a speaker recognition system to
likelihood ratios (LR) for evaluating the strength of forensic voice evidence. A robust LR estimation algorithm
using adaptive within-source-variance control is developed to accurately estimate a model of the suspect. The
algorithm adaptively combines information of reference speakers and that of the suspect to model the withinsource-
variability of the suspect. Compared with a baseline recognition system, the system using the proposed
algorithm has better discrimination capability and reliability, and the magnitude of evidence strength is also
improved.  

参考文献

[1] MORRISON Geoffrey Stewart, ZHANG C L, ROSE Philip. An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system [J]. Forensic Science International , 2011, 208: 59-65.

[2] ZHANG C L, MORRISON G S, ENZINGER E, OCHOA F. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison - female voices [J]. Speech Communication, 2013, 55(6): 796-813.

[3] MORRISON G S. A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM) [J]. Speech Communication, 2011, 53 (2): 242-256.

[4] WANG Huapeng, YANG Jun, XU Yong. Forensic speaker recognition in likelihood ratio framework [J]. Journal of Data Acquisition & Processing, 2013, 28(2): 239-43.

[5] CASTRO Daniel Ramos. Forensic evaluation of the evidence using automatic speaker recognition systems [D]. Universidad Autonoma de Madrid, November,  2007.  

[6] MORRISON G S. Tutorial on logistic regression calibration and fusion: converting a score to a likelihood ratio [J]. Australian Journal of Forensic Sciences, 2013, 45(2): 173-197.

[7] POH N, KITTLER J. On the use of log-likelihood ratio based model-speci?c score normalisation in biometric authentication [C]//LNCS 4542, IEEE/IAPR Proc. Int’l Conf. Biometrics (ICB’07) , 2007: 614-624.

[8] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000: 19-41.

[9]  BOTTI F, ALEXANDER A, DRYGAJLO A. An interpretation framwork for the evaluation of evidence in forensic automatic speaker recognition with limited suspect data[C]//Proc. of Odyssey, 2004: 63-68.

[10] Kinoshita Y, Osanai T. Within speaker variation in diphthongal dynamics: What can we compare? [C]//Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand, Canberra, Australia: Australasian Speech Science & Technology Association, 2006: 112-117.

[11] ROSE P. Technical forensic speaker recognition: evaluation, types and testing of evidence[J]. Computer Speech and Language, 2006: 159-191.

[12] BRÜMMER N, du PREEZ J. Application independent evaluation of speaker detection [J]. Computer Speech and Language, 2006: 230-275.

[13] ROSE P. Accounting for correlation in linguistic-acoustic likelihood ratio-based forensic speaker discrimination [C]//Proc. IEEE Odyssey Speaker and Language Recognition Workshop, 2006: 1-8.
文章导航

/