基于动态匹配词格检索的关键词检测

doi:10.3969/j.issn.0255-8297.2014.02.006

应用科学学报 ›› 2014, Vol. 32 ›› Issue (2): 149-155.doi: 10.3969/j.issn.0255-8297.2014.02.006

基于动态匹配词格检索的关键词检测

郑永军，张连海

信息工程大学信息系统工程学院，郑州450002

收稿日期:2013-07-15 修回日期:2013-10-22 出版日期:2014-03-25 发布日期:2013-10-22
作者简介:张连海，博士，副教授，研究方向：语音信号处理、模式识别，E-mail: lianhaiz@sina.com
基金资助:
国家自然科学基金(No.61175017)；全军军事学研究课题基金(No.2010JY0256-143)资助

Keyword Detection Based on Dynamic Match Lattice Spotting

ZHENG Yong-jun, ZHANG Lian-hai

Institute of Information Systems Engineering, Information Engineering University, Zhengzhou 450002, China

Received:2013-07-15 Revised:2013-10-22 Online:2014-03-25 Published:2013-10-22

摘要/Abstract

摘要： 对生活中涌现的海量语音数据需要进行快速而准确的检索. 提出一种基于动态匹配词格检索的关键词检测方法，应用TRAP 特征和多层感知器创建更为精准的音素Lattice. 在索引阶段执行一个改进的维特比算法遍历Lattice 来创建一个固定长度的音素序列数据库，在检索阶段应用最小编辑距离作为置信度来实现关键词的检
出. 实验结果表明，该方法相比应用MFCC 和PLP 特征的基线系统具有一定的优势，召回率可提升5% 左右.

关键词: 关键词检测, 动态匹配词格检索, TRAP特征, 最小编辑距离

Abstract: The large amount of speech data requires techniques for rapid and accurate search. This paper proposes a keyword spotting method based on dynamic match Lattice spotting (DMLS). It generates more accurate phone Lattice with TRAP features and multilayer perceptron, and performs a modified Viterbi traversal to compile a database of fixed-length phone sequences in speech indexing. In the searching stage, a minimum edit distance is used as the confidence score to implement the keyword spotting. Tests show that the proposed method is superior to baseline systems with MFCC and PLP features with the recall rate improved by about 5%.

Key words: keyword spotting, dynamic match Lattice spotting, TRAP feature, minimum edit distance

中图分类号:

TP391

郑永军，张连海. 基于动态匹配词格检索的关键词检测[J]. 应用科学学报, 2014, 32(2): 149-155.

ZHENG Yong-jun, ZHANG Lian-hai. Keyword Detection Based on Dynamic Match Lattice Spotting[J]. Journal of Applied Sciences, 2014, 32(2): 149-155.

参考文献

[1] 王炳锡，屈丹，彭煊. 实用语音识别基础[M]. 北京：国防工业出版社，2005: 287-291.

WANG Bingxi, QU Dan, PENG Xuan. Practical fundamentals of speech recognition [M]. Beijing: National Defense Industry Press, 2005: 287-291. (in Chinese)

[2] 孙成立. 语音关键词识别技术的研究[D]. 北京：北京邮电大学，2008: 1-2.

SUN Chengli. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008: 1-2. (in Chinese)

[3] NG K, ZUE V W. Subword-based approaches for spoken document retrieval [J]. Speech Communication, 2000, 32: 157-186.

[4] AKBACAK M, BURGET L, WANG W, VAN H J. Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2013: 8267-8271.

[5] THAMBIRATNAM K, SRIDHARAN S. Rapid yet accurate speech indexing using dynamic match lattice spotting [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(1): 346-357.

[6] HAN C, KANG S, LEE C. Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer [C]//The 11th Annual Conference of the International Speech Communication Association, 2010: 202-205.

[7] RAJABZADEH M, TABIBIAN S, AKBARI A. Improved dynamic match phone lattice search using viterbi scores and jaro winkler distance for keyword spotting system [C]//International Symposium on Artificial Intelligence and Signal Processing, 2012: 423-427.

[8] 李文昕，屈丹，李弼程，王炳锡. 语音关键词检测系统中基于时长和边界信息的置信度[J]. 应用科学学报，2012，30(6): 588-594.

LI Wenxin, QU Dan, LI Bicheng, WANG Bingxi. Confidence measure based on time and boundary features for speech keyword spotting system [J]. Journal of Applied Sciences, 2012, 30(6): 588-594. (in Chinese)

[9] HERMANSKY H, SHARMA S. TRAPs-classifiers of temporal patterns [C]//International Conference on Spoken Language Processing, 1998:1003-1006.

[10] SHARMA S, ELLIS D, KAJAREKAR S, JAIN P, HERMANSKY H. Feature extraction using non-linear transformation for robust speech recognition on the aurora database [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2000: 1117-1120.

[11] SCHWARZ P. Phoneme recognition based on long temporal context [D]. Brno: Brno University of Technology, 2008: 7-40.

[12] MATEJKA P, SCHWARZ P, CERNOCKY J. Recognition of phoneme strings using TRAP technique [C]//European Conference on Speech Communication and Technology, 2003: 1-4.

[13] GREZL F, KARAFIAT M. Integrating recent MLP feature extraction techniques into TRAP architecture [C] //The 12th Annual Conference of the International Speech Communication Association, 2011: 1229-1232.

[14] TUSKE Z, PLAHL C, SCHLUTER R. A study on speaker normalized MLP features in LVCSR [C]//The 12th Annual Conference of the International Speech Communication Association, 2011: 1089-1092.

[15] WALLACE R. Fast and accurate phonetic spoken term detection [D]. Queensland: Queensland University of Technology, 2010:51-90.

[16] WANG D, KING S, FRANKEL J. Stochastic pronunciation modeling for out-of-vocabulary spoken term detection [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 688-698.

[17] LIN H, SYUPAKOV A, BILMES J. Improving multi-lattice alignment based spoken keyword spotting [C]//IEEE International Conference on Acoustic, Speech and Signal Processing, 2009: 4877-4880.

基于动态匹配词格检索的关键词检测

Keyword Detection Based on Dynamic Match Lattice Spotting

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	方悄云, 吕东辉, 孙九爱. 改进的物体表面重建的三角网格法[J]. 应用科学学报, 2016, 34(2): 145-153.
[2]	陈竺益, 方针. 基于色像差特性的图像篡改检测[J]. 应用科学学报, 2015, 33(6): 604-614.
[3]	季平, 秦川. 基于多码本矢量量化的图像篡改恢复[J]. 应用科学学报, 2015, 33(6): 615-627.
[4]	唐振华, 梁聪, 区骋, 黄旭方, 覃团发. 基于DCT域纹理特征的多聚焦图像融合[J]. 应用科学学报, 2015, 33(6): 628-636.
[5]	王龙1,2，杨俊安1,2，陈雷1,2，林伟3，刘辉1,2. 基于循环神经网络的汉语语言模型并行优化算法[J]. 应用科学学报, 2015, 33(3): 253-261.
[6]	姚利涛，董育宁. 无监督的视频业务特征分析与分类[J]. 应用科学学报, 2015, 33(2): 117-128.
[7]	张旻1,2，吕全通1,2，朱宇轩3. 基于线性分组码的自同步扰码盲识别[J]. 应用科学学报, 2015, 33(2): 178-186.
[8]	张晓琳，方针，张新鹏. 利用通道间相关性的CFA图像盲取证[J]. 应用科学学报, 2015, 33(1): 87-94.
[9]	王贵锦1，刘博1，何贝1，张树君2，孟龙2. 应用对称匹配的PET 瓶瓶盖缺陷实时检测[J]. 应用科学学报, 2014, 32(6): 617-624.
[10]	王瑞1,2，肖任1,2，陈俊丽1,2，余宗鑫1,2，万旺根1,2. 基于移动最小二乘与控制曲线的3D 点云变形[J]. 应用科学学报, 2014, 32(5): 463-468.
[11]	丰祥1,2，万旺根1,2. 运用压缩感知理论的图像稀疏表示与重建[J]. 应用科学学报, 2014, 32(5): 447-452.
[12]	张百睿，钟清华，薛秀婷. 鲁棒的快速车道偏移警告[J]. 应用科学学报, 2014, 32(5): 530-536.
[13]	谢凯，张涛，奚玲，李文祥，平西建. k均值聚类的混合异构图像隐写分析[J]. 应用科学学报, 2014, 32(5): 543-550.
[14]	闫晓蒙，张涛，李文祥，平西建. 结合WS残差法和MAP载体估计的隐写负载定位[J]. 应用科学学报, 2014, 32(4): 401-408.
[15]	张晓丹1，张志禹2，徐进1，朱耀麟1. 逆时偏移成像与SPIHT 的应用[J]. 应用科学学报, 2014, 32(3): 274-280.