In this paper, we studied the algorithm of speech emotion recognition based on convolutional neural networks, and improved the algorithm of updating convolution kernel weight during the training process of traditional convolutional neural networks, resulting that the algorithm of updating the convolution kernel weight was related to the number of iterations. Simultaneously, in order to increase the difference of emotional phonetic features, the data matrix of the Mel-frequency cepstral coefficients (MFCC) obtained by preprocessing the speech signal was transformed, consequently, improved the expressive ability of convolutional neural networks. Experiments showed that the error recognition rate of the improved algorithm of speech emotion recognition was about 7% lower than that of traditional algorithms.
ZENG Run-hua, ZHANG Shu-qun
. Speech and Emotional Recognition Method Based on Improving Convolutional Neural Networks[J]. Journal of Applied Sciences, 2018
, 36(5)
: 837
-844
.
DOI: 10.3969/j.issn.0255-8297.2018.05.011
[1] Anagnostopoulos C N, Iliou T, Giannoukos I. Features and classifiers for emotion recognition from speech:a survey from 2000 to 2011[J]. Artificial Intelligence Review, 2015, 43(2):155-177.
[2] Juang B H, Rabiner L. Mixture autoregressive hidden Markov models for speech signals[J]. Procedia Computer Science, 2015, 61(6):328-333.
[3] Vlassis N, Likas A. A greedy EM algorithm for Gaussian mixture learning[J]. Neural Processing Letters, 2002, 15(1):77-87.
[4] Hu H, Xu M X, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//IEEE International Conference on Acoustics, 2007:IV-413-IV-416.
[5] Lee C M, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S. Emotion recognition based on phoneme classes[J]. Proc. icslp Oct, 2004:889-892.
[6] Mao Q, Dong M, Huang Z, Zhan Y. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[7] Zhang B, Quan C, Ren F. Performance of convolution neural network on the recognition of speech emotion and images[C]//AIA International Advanced Information Institute, 2016:12-21.
[8] Zheng W Q, Yu J S, Zou Y X. An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//International Conference on Affective Computing and Intelligent Interaction, 2015:827-831.
[9] 郭鹏娟. 语音情感特征提取方法和情感识别研究[D]. 西安:西北工业大学,2007.
[10] Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. Readings in Speech Recognition, 1990, 28(4):65-74.
[11] Hinton G, Deng L, Yu D, Dahl G E, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[12] Lecun Y. Convolutional networks for images, speech, and time-series[J]. Handbook of Brain Theory & Neural Networks, 1995.
[13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems, 2012:1097-1105.
[14] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.