针对关键词识别中基于后验概率的置信度方法对语音发音变化信息利用不充分的问题,提出了一种基于时长和边界信息的置信度. 该方法引入一个松弛因子,灵活地选择词信息相同的弧段计算置信度,从而进行关键词拒识. 在此基础上,设计并实现了一个基于Lattice 的大词表语音关键词检测系统,先用改进的动态规划算法在音节网格上进行关键词检出,尽可能多地给出关键词候选,然后采用基于时长和边界信息的置信度进行关键词确认. 实验结果表明,该方法相对于主流的计算方法,系统的等错误率(equal error rate,EER)提高了7%.
As the posterior probability confidence measure cannot take full advantage of the variations in voice pronunciation, we propose an improved confidence measure based on time and boundary feature. A relaxation rate is introduced to have a flexible selection of the segmental arcs with the same words for the
calculation of confidence by which detections are rejected. On this basis, a keyword spotting system with large keyword table based on lattice is designed. An improved dynamic time warping (DTW) algorithm is used for keyword matching through the lattice to generate maximum keyword hypotheses. We have applied the proposed confidence measure in keyword verification. The results show that the equal error rate (EER) achieves 7% relative improvement compared to the mainstream method of calculation.
[1] Szoke I, Schwarz P, Matejka P. Phoneme based acoustics keyword spotting in informal continuous speech [C]//Proceedings of Radioelektronika, 2005:302-305.
[2] Veryri D, Shafran I, Stolcke A. The SRI/OGI 2006 spoken term detection system [C]//Proceedings of Interspeech, 2007: 2393-2396.
[3] Siohan O, Ramabhadran B, Mamou J. The IBM 2006 spoken term detection system [C]//NIST Spoken Term Detection Evaluation workshop, 2006.
[4] Jiang Hui. Confidence measures for speech recognition:a survey [J]. Speech Communication, 2005:455-470.
[5] Leung K Y, Siu M. Articulatory-feature-based confidencemeasures [J]. Speech Communication, 2005:1-21.
[6] 国玉晶,刘刚,刘健,郭军. 基于环境特征的语音识别置信度研究[J]. 清华大学学报:自然科学版,2009,49(1): 26-31.
Guo Yujing, Liu Gang, Liu Jian, Guo Jun. Environmental features based confidence measure for speech recognition [J]. Journal of Tsinghua Universty, 2009, 49(1): 26-31. (in Chinese)
[7] Jiang Hui. A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2005,13(5): 945-955.
[8] 张鹏远,邵健,赵庆卫,颜永红. 广播新闻语音的关键词检测系统[J]. 通信学报,2007, 28(12): 131-135.
Zhang Peng-yuan, Shao Jian, Zhao Qingwei, YanYonghong. Keyword spotting system for broadcast news [J]. Journal on Communications, 2007, 28(12):131-135. (in Chinese)
[9] Pan Y C , Chang H L Chen B, Lee L S. Subwordbased position specific posterior lattices (S-PSPL) for indexing speech information [C]//Proceedings of Interspeech, 2007: 318-321.
[10] Gao Jie, Zhao Qingwei, Xu Ran, Yan Yonghong. Improved Lattice-based confidence measure for speech recognition via a lattice cut off procedure [J]. IEEE Computer Science, 2009: 473-476.
[11] Wang D, Tejedor J, Frankel J. Posterior-based confidence measures for spoken term detection [C]// Proceedings of ICASSP, 2009: 4889-4892.
[12] Wessel F, Schluter R, Macherey K. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3): 288-298.
[13] Szoke I. Hybrid word-subword spoken term detection [D]. [S.l.]: Brno University of Technology, 2010.[14] Wang D, King S, Frankel J. Term-dependent confidence for out-of-vocabulary term detection[C]//Proceedings of Interspeech, 2009: 2139-2142.
[15] Ogawa A, Nakmaura A. Discriminative confidence and error cause estimation for extended speech recognition function [C]//Proceedings of ICASSP, 2010:4454-4457.
[16] Tejedor J, Toledano D T, Bautista M. King S,Wang dong, Col’as J. Augmented set of features for confidence estimation in spoken term detection [C]//Proceedings of Interspeech, 2010: 701-704.
[17] Szoke I, Schwarz P, Matejka P. Comparison of keyword spotting approaches for informal continuous speech [C]//Proceedings of Interspeech, 2005: 633-636.
[18] Thambiratnam K, Sridharan S. Rapid yet accurate speech indexing using dynamic match lattice spotting [J]. IEEE Transactions on Audio, Speech,and Language Processing, 2007, 15(1): 346-357.