As the posterior probability confidence measure cannot take full advantage of the variations in voice pronunciation, we propose an improved confidence measure based on time and boundary feature. A relaxation rate is introduced to have a flexible selection of the segmental arcs with the same words for the
calculation of confidence by which detections are rejected. On this basis, a keyword spotting system with large keyword table based on lattice is designed. An improved dynamic time warping (DTW) algorithm is used for keyword matching through the lattice to generate maximum keyword hypotheses. We have applied the proposed confidence measure in keyword verification. The results show that the equal error rate (EER) achieves 7% relative improvement compared to the mainstream method of calculation.
LI Wen-xin, QU Dan, LI Bi-cheng, WANG Bing-xi
. Confidence Measure Based on Time and Boundary Features for Speech Keyword Spotting System[J]. Journal of Applied Sciences, 2012
, 30(6)
: 588
-594
.
DOI: 10.3969/j.issn.0255-8297.2012.06.005
[1] Szoke I, Schwarz P, Matejka P. Phoneme based acoustics keyword spotting in informal continuous speech [C]//Proceedings of Radioelektronika, 2005:302-305.
[2] Veryri D, Shafran I, Stolcke A. The SRI/OGI 2006 spoken term detection system [C]//Proceedings of Interspeech, 2007: 2393-2396.
[3] Siohan O, Ramabhadran B, Mamou J. The IBM 2006 spoken term detection system [C]//NIST Spoken Term Detection Evaluation workshop, 2006.
[4] Jiang Hui. Confidence measures for speech recognition:a survey [J]. Speech Communication, 2005:455-470.
[5] Leung K Y, Siu M. Articulatory-feature-based confidencemeasures [J]. Speech Communication, 2005:1-21.
[6] 国玉晶,刘刚,刘健,郭军. 基于环境特征的语音识别置信度研究[J]. 清华大学学报:自然科学版,2009,49(1): 26-31.
Guo Yujing, Liu Gang, Liu Jian, Guo Jun. Environmental features based confidence measure for speech recognition [J]. Journal of Tsinghua Universty, 2009, 49(1): 26-31. (in Chinese)
[7] Jiang Hui. A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2005,13(5): 945-955.
[8] 张鹏远,邵健,赵庆卫,颜永红. 广播新闻语音的关键词检测系统[J]. 通信学报,2007, 28(12): 131-135.
Zhang Peng-yuan, Shao Jian, Zhao Qingwei, YanYonghong. Keyword spotting system for broadcast news [J]. Journal on Communications, 2007, 28(12):131-135. (in Chinese)
[9] Pan Y C , Chang H L Chen B, Lee L S. Subwordbased position specific posterior lattices (S-PSPL) for indexing speech information [C]//Proceedings of Interspeech, 2007: 318-321.
[10] Gao Jie, Zhao Qingwei, Xu Ran, Yan Yonghong. Improved Lattice-based confidence measure for speech recognition via a lattice cut off procedure [J]. IEEE Computer Science, 2009: 473-476.
[11] Wang D, Tejedor J, Frankel J. Posterior-based confidence measures for spoken term detection [C]// Proceedings of ICASSP, 2009: 4889-4892.
[12] Wessel F, Schluter R, Macherey K. Confidence measures for large vocabulary continuous speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3): 288-298.
[13] Szoke I. Hybrid word-subword spoken term detection [D]. [S.l.]: Brno University of Technology, 2010.[14] Wang D, King S, Frankel J. Term-dependent confidence for out-of-vocabulary term detection[C]//Proceedings of Interspeech, 2009: 2139-2142.
[15] Ogawa A, Nakmaura A. Discriminative confidence and error cause estimation for extended speech recognition function [C]//Proceedings of ICASSP, 2010:4454-4457.
[16] Tejedor J, Toledano D T, Bautista M. King S,Wang dong, Col’as J. Augmented set of features for confidence estimation in spoken term detection [C]//Proceedings of Interspeech, 2010: 701-704.
[17] Szoke I, Schwarz P, Matejka P. Comparison of keyword spotting approaches for informal continuous speech [C]//Proceedings of Interspeech, 2005: 633-636.
[18] Thambiratnam K, Sridharan S. Rapid yet accurate speech indexing using dynamic match lattice spotting [J]. IEEE Transactions on Audio, Speech,and Language Processing, 2007, 15(1): 346-357.