Hierarchical Structure of Deep Belief Network for Phoneme Recognition

WANG Yi1,2, YANG Jun-an1,2, LIU Hui1,2, LIU Lin3, LU Gao4

doi:10.3969/j.issn.0255-8297.2014.05.013

Journal of Applied Sciences >

2014 , Vol. 32 >Issue 5: 515 - 522

DOI: https://doi.org/10.3969/j.issn.0255-8297.2014.05.013

Signal and Information Processing

Hierarchical Structure of Deep Belief Network for Phoneme Recognition

Expand

1. Room 404, Electronic Engineering Institute, Hefei 230037, China
2. Key Laboratory of Electronic Restriction, Anhui Province, Hefei 230037, China
3. Anhui USTC iFlytek Corporation, Hefei 230037, China
4. No.52 Sub Unit, No.77108 Unit, Chengdu 611233, China

Received date: 2013-09-08

Revised date: 2014-03-28

Online published: 2014-03-28

Fold

Abstract

To overcome the problem of poor recognition performance and being prone to be trapped in local
optima, this paper proposes a hierarchical phoneme classification method based on deep belief network (DBN).
The system consists of two parts: a bottleneck feature and a phoneme classifier, both DBN based. The two
parts are combined to form a phoneme recognition system. The system can extract low dimensional and
supervising features, and improve classification accuracy. Experiments on TIMIT corpus suggest that the
proposed system can obtain 18.5% phoneme error rate as compared with existing systems.

Key words： phoneme recognition; hierarchical structure; deep belief network; bottleneck feature

Cite this article

WANG Yi1,2, YANG Jun-an1,2, LIU Hui1,2, LIU Lin3, LU Gao4 . Hierarchical Structure of Deep Belief Network for Phoneme Recognition[J]. Journal of Applied Sciences, 2014 , 32(5) : 515 -522 . DOI: 10.3969/j.issn.0255-8297.2014.05.013

References

[1]. Schwarz P. Phoneme Recognition based on Long Temporal Context [D]. PH.D. Thesis, Faculty of Information Technology BUT, Brno University of Technology, Brno, Czech, 2008.

[2]. Jansen A and Niyogi P. Point Process Models for Spotting Keywords in Continuous Speech. IEEE Transaction on Audio, Speech, and Language Processing [J]. 2009, 17 (8):1457-1470.

[3]. Siohan O and Bacchiani M. Fast Vocabulary Independent Audio Search Using Path-Based Graph Indexing [C]. Proceedings of the Eurospeech 2005, Lisbon, Portugal, 4-8 September 2005.

[4]. Matejka P, Schwarz P, Cernocký J and Chytil P. Phonotactic Language Identification using High Quality Phoneme Recognition [C]. Proceedings of the INTERSPEECH, Lisbon, Portugal, 2005: 2237-2240.

[5]. Deng L. An Overview of Deep-Structured Learning for Information Processing [C]. Proceedings of the Asian-Pacific Signal and Information Processing-Annual Summit and Conference, Xian, China, 2011:1-14.

[6]. Hinton G and Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks [J]. Science 2006, 313(5786): 504-507.

[7]. Bao Y, Jiang H and Liu C. Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems [C]. Proceedings of the IEEE 11th International Conference on Signal Processing (ICSP2012), Beijing, China, 2012: 562-566.

[8]. Mohamed A, Dahl G, Hinton G. Acoustic Modeling using Deep Belief Networks[J]. IEEE Transaction on Audio, Speech, and Language Processing 2012; 20 (1):14-22.

[9]. Dahl G, Dong Y, Deng L and Acero A. Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition [J]. IEEE Transaction on Audio, Speech, and Language Processing 2012, 20 (1):30-42.

[10]. Pinto J, Sivaram GSVS, Magimai-Doss M, Hermansky H and Bourlard H. Analysis of MLP Based Hierarchical Phoneme. IEEE Transactions on Audio, Speech, and Language Processing [J]. 2011, 19(2):225-241.

[11]. Sivaram GSVS, Hermansky H. Sparse Multilayer Perceptron for Phoneme Recognition. IEEE Transactions on Audio, Speech, and Language Processing [J].2012, 20(1): 23-29.

[12]. Tara S, Brian K and Bhuvana R. Auto-Encoder Bottleneck Features Using Deep Belief Networks [C]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2012, Kyoto, Japan, 4153-4156 March 2012.

[13]. Siniscalchi SM, Yu D, Deng L and Lee CH. Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Mode. IEEE Signal Processing Letters [J].2013, 20(3):201- 204.

[14]. Dong Y and Deng L. Deep Learning and Its Applications to Signal and Information Processing [J]. IEEE Signal Processing Magazine 2011, 28(1), 145-154.

[15]. Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D and Bengio Y. Theano :A CPU and GPU Math Expression Compiler[C]. Proceedings of the Python for Scientific Computing Conference (SciPy) 2010. Austin, U.S.A.

[16]. The ICSI Quicknet Software Package [DB\CD]. Available from: http://www.icsi.berkeley.edu/Speech /qn.html.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References