语音关键词检测系统中基于时长和边界信息的置信度

李文昕， 屈丹， 李弼程， 王炳锡

doi:10.3969/j.issn.0255-8297.2012.06.005

应用科学学报 >

2012 , Vol. 30 >Issue 6: 588 - 594

DOI: https://doi.org/10.3969/j.issn.0255-8297.2012.06.005

信号与信息处理

语音关键词检测系统中基于时长和边界信息的置信度

展开

解放军信息工程大学信息工程学院，郑州450002

屈丹，博士，副教授，研究方向：语音信号处理、模式识别，E-mail: qudanqudan@sina.com；李弼程，教授，博导，研究方向：智能信息处理、语音信号处理，E-mail: lbclm@163.com；王炳锡，教授，博导，研究方向：语音信号处理、模式识别、自然语言处理，E-mail: bingxiwang@163.com

收稿日期: 2011-07-13

修回日期: 2011-09-06

网络出版日期: 2011-09-06

基金资助

国家自然科学基金(No.61175017, No.60872142)资助

收起

Confidence Measure Based on Time and Boundary Features for Speech Keyword Spotting System

Expand

Institute of Information Engineering, PLA Information Engineering University, Zhengzhou 450002, China

Received date: 2011-07-13

Revised date: 2011-09-06

Online published: 2011-09-06

Fold

摘要

针对关键词识别中基于后验概率的置信度方法对语音发音变化信息利用不充分的问题，提出了一种基于时长和边界信息的置信度. 该方法引入一个松弛因子，灵活地选择词信息相同的弧段计算置信度，从而进行关键词拒识. 在此基础上，设计并实现了一个基于Lattice 的大词表语音关键词检测系统，先用改进的动态规划算法在音节网格上进行关键词检出，尽可能多地给出关键词候选，然后采用基于时长和边界信息的置信度进行关键词确认. 实验结果表明，该方法相对于主流的计算方法，系统的等错误率（equal error rate，EER）提高了7%.

关键词： 语音识别; 关键词检出; 置信度计算

本文引用格式

李文昕，屈丹，李弼程，王炳锡 . 语音关键词检测系统中基于时长和边界信息的置信度[J]. 应用科学学报, 2012 , 30(6) : 588 -594 . DOI: 10.3969/j.issn.0255-8297.2012.06.005

Abstract

As the posterior probability confidence measure cannot take full advantage of the variations in voice pronunciation, we propose an improved confidence measure based on time and boundary feature. A relaxation rate is introduced to have a flexible selection of the segmental arcs with the same words for the
calculation of confidence by which detections are rejected. On this basis, a keyword spotting system with large keyword table based on lattice is designed. An improved dynamic time warping (DTW) algorithm is used for keyword matching through the lattice to generate maximum keyword hypotheses. We have applied the proposed confidence measure in keyword verification. The results show that the equal error rate (EER) achieves 7% relative improvement compared to the mainstream method of calculation.

Key words： speech recognition; keyword spotting; confidence measure

参考文献

[1] Szoke I, Schwarz P, Matejka P. Phoneme based acoustics keyword spotting in informal continuous speech [C]//Proceedings of Radioelektronika, 2005:302-305.

[2] Veryri D, Shafran I, Stolcke A. The SRI/OGI 2006 spoken term detection system [C]//Proceedings of Interspeech, 2007: 2393-2396.

[3] Siohan O, Ramabhadran B, Mamou J. The IBM 2006 spoken term detection system [C]//NIST Spoken Term Detection Evaluation workshop, 2006.

[4] Jiang Hui. Confidence measures for speech recognition:a survey [J]. Speech Communication, 2005:455-470.

[5] Leung K Y, Siu M. Articulatory-feature-based confidencemeasures [J]. Speech Communication, 2005:1-21.

[6] 国玉晶，刘刚，刘健，郭军. 基于环境特征的语音识别置信度研究[J]. 清华大学学报：自然科学版，2009,49(1): 26-31.

Guo Yujing, Liu Gang, Liu Jian, Guo Jun. Environmental features based confidence measure for speech recognition [J]. Journal of Tsinghua Universty, 2009, 49(1): 26-31. (in Chinese)

[7] Jiang Hui. A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2005,13(5): 945-955.

[8] 张鹏远，邵健，赵庆卫，颜永红. 广播新闻语音的关键词检测系统[J]. 通信学报，2007, 28(12): 131-135.

Zhang Peng-yuan, Shao Jian, Zhao Qingwei, YanYonghong. Keyword spotting system for broadcast news [J]. Journal on Communications, 2007, 28(12):131-135. (in Chinese)

[9] Pan Y C , Chang H L Chen B, Lee L S. Subwordbased position specific posterior lattices (S-PSPL) for indexing speech information [C]//Proceedings of Interspeech, 2007: 318-321.

[10] Gao Jie, Zhao Qingwei, Xu Ran, Yan Yonghong. Improved Lattice-based confidence measure for speech recognition via a lattice cut off procedure [J]. IEEE Computer Science, 2009: 473-476.

[11] Wang D, Tejedor J, Frankel J. Posterior-based confidence measures for spoken term detection [C]// Proceedings of ICASSP, 2009: 4889-4892.

[12] Wessel F, Schluter R, Macherey K. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3): 288-298.

[13] Szoke I. Hybrid word-subword spoken term detection [D]. [S.l.]: Brno University of Technology, 2010.[14] Wang D, King S, Frankel J. Term-dependent confidence for out-of-vocabulary term detection[C]//Proceedings of Interspeech, 2009: 2139-2142.

[15] Ogawa A, Nakmaura A. Discriminative confidence and error cause estimation for extended speech recognition function [C]//Proceedings of ICASSP, 2010:4454-4457.

[16] Tejedor J, Toledano D T, Bautista M. King S,Wang dong, Col’as J. Augmented set of features for confidence estimation in spoken term detection [C]//Proceedings of Interspeech, 2010: 701-704.

[17] Szoke I, Schwarz P, Matejka P. Comparison of keyword spotting approaches for informal continuous speech [C]//Proceedings of Interspeech, 2005: 633-636.

[18] Thambiratnam K, Sridharan S. Rapid yet accurate speech indexing using dynamic match lattice spotting [J]. IEEE Transactions on Audio, Speech,and Language Processing, 2007, 15(1): 346-357.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献