Signal and Information Processing

Keyword Detection Based on Dynamic Match Lattice Spotting

Expand
  • Institute of Information Systems Engineering, Information Engineering University, Zhengzhou 450002, China

Received date: 2013-07-15

  Revised date: 2013-10-22

  Online published: 2013-10-22

Abstract

The large amount of speech data requires techniques for rapid and accurate search. This paper proposes a keyword spotting method based on dynamic match Lattice spotting (DMLS). It generates more accurate phone Lattice with TRAP features and multilayer perceptron, and performs a modified Viterbi traversal to compile a database of fixed-length phone sequences in speech indexing. In the searching stage, a minimum edit distance is used as the confidence score to implement the keyword spotting. Tests show that the proposed method is superior to baseline systems with MFCC and PLP features with the recall rate improved by about 5%.

Cite this article

ZHENG Yong-jun, ZHANG Lian-hai . Keyword Detection Based on Dynamic Match Lattice Spotting[J]. Journal of Applied Sciences, 2014 , 32(2) : 149 -155 . DOI: 10.3969/j.issn.0255-8297.2014.02.006

References

[1] 王炳锡,屈丹,彭煊. 实用语音识别基础[M]. 北京:国防工业出版社,2005: 287-291.

WANG Bingxi, QU Dan, PENG Xuan. Practical fundamentals of speech recognition [M]. Beijing: National Defense Industry Press, 2005: 287-291. (in Chinese)

[2] 孙成立. 语音关键词识别技术的研究[D]. 北京:北京邮电大学,2008: 1-2.

SUN Chengli. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008: 1-2. (in Chinese)

[3] NG K, ZUE V W. Subword-based approaches for spoken document retrieval [J]. Speech Communication, 2000, 32: 157-186.

[4] AKBACAK M, BURGET L, WANG W, VAN H J. Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2013: 8267-8271.

[5] THAMBIRATNAM K, SRIDHARAN S. Rapid yet accurate speech indexing using dynamic match lattice spotting [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(1): 346-357.

[6] HAN C, KANG S, LEE C. Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer [C]//The 11th Annual Conference of the International Speech Communication Association, 2010: 202-205.

[7] RAJABZADEH M, TABIBIAN S, AKBARI A. Improved dynamic match phone lattice search using viterbi scores and jaro winkler distance for keyword spotting system [C]//International Symposium on Artificial Intelligence and Signal Processing, 2012: 423-427.

[8] 李文昕,屈丹,李弼程,王炳锡. 语音关键词检测系统中基于时长和边界信息的置信度[J]. 应用科学学报,2012,30(6):  588-594.

LI Wenxin, QU Dan, LI Bicheng, WANG Bingxi. Confidence measure based on time and boundary features for speech keyword spotting system [J]. Journal of Applied Sciences, 2012, 30(6): 588-594. (in Chinese)

[9] HERMANSKY H, SHARMA S. TRAPs-classifiers of temporal patterns [C]//International Conference on Spoken Language Processing, 1998:1003-1006.

[10] SHARMA S, ELLIS D, KAJAREKAR S, JAIN P, HERMANSKY H. Feature extraction using non-linear transformation for robust speech recognition on the aurora database [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2000: 1117-1120.

[11] SCHWARZ P. Phoneme recognition based on long temporal context [D]. Brno: Brno University of Technology, 2008: 7-40.

[12] MATEJKA P, SCHWARZ P, CERNOCKY J. Recognition of phoneme strings using TRAP technique [C]//European Conference on Speech Communication and Technology, 2003: 1-4.

[13] GREZL F, KARAFIAT M. Integrating recent MLP feature extraction techniques into TRAP architecture [C] //The 12th Annual Conference of the International Speech Communication Association, 2011: 1229-1232.

[14] TUSKE Z, PLAHL C, SCHLUTER R. A study on speaker normalized MLP features in LVCSR [C]//The 12th Annual Conference of the International Speech Communication Association, 2011: 1089-1092.

[15] WALLACE R. Fast and accurate phonetic spoken term detection [D]. Queensland: Queensland University of Technology, 2010:51-90.

[16] WANG D, KING S, FRANKEL J. Stochastic pronunciation modeling for out-of-vocabulary spoken term detection [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 688-698.

[17] LIN H, SYUPAKOV A, BILMES J. Improving multi-lattice alignment based spoken keyword spotting [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2009: 4877-4880.

 
Outlines

/