改进卷积神经网络的语音情感识别方法

doi:10.3969/j.issn.0255-8297.2018.05.011

应用科学学报 ›› 2018, Vol. 36 ›› Issue (5): 837-844.doi: 10.3969/j.issn.0255-8297.2018.05.011

改进卷积神经网络的语音情感识别方法

曾润华, 张树群

暨南大学信息科学技术学院, 广州 510632

收稿日期:2017-06-24 修回日期:2017-12-22 出版日期:2018-09-30 发布日期:2018-09-30
通信作者: 张树群,副教授,研究方向:嵌入式系统和信号处理,E-mail:zhang322@jun.edu.cn E-mail:zhang322@jun.edu.cn

Speech and Emotional Recognition Method Based on Improving Convolutional Neural Networks

ZENG Run-hua, ZHANG Shu-qun

School of Information Science and Technology, Jinan University, Guangzhou 510632, China

Received:2017-06-24 Revised:2017-12-22 Online:2018-09-30 Published:2018-09-30

摘要/Abstract

摘要： 研究了基于卷积神经网络的语音情感识别算法，改进了传统卷积神经网络训练过程中的卷积核权值的更新算法，使卷积核权值的更新算法与迭代次数有关联；同时为了增加情感语音之间的特征差异性，将语音信号经过预处理后得到的梅尔频率倒谱系数特征数据矩阵进行变换，提高卷积神经网络的表达能力.实验表明，改进后的语音情感识别算法的错误识别率比传统算法的错误识别率约减少7%.

关键词: 梅尔频率倒谱系数, 识别率, 卷积神经网络, 语音情感识别

Abstract: In this paper, we studied the algorithm of speech emotion recognition based on convolutional neural networks, and improved the algorithm of updating convolution kernel weight during the training process of traditional convolutional neural networks, resulting that the algorithm of updating the convolution kernel weight was related to the number of iterations. Simultaneously, in order to increase the difference of emotional phonetic features, the data matrix of the Mel-frequency cepstral coefficients (MFCC) obtained by preprocessing the speech signal was transformed, consequently, improved the expressive ability of convolutional neural networks. Experiments showed that the error recognition rate of the improved algorithm of speech emotion recognition was about 7% lower than that of traditional algorithms.

Key words: speech emotion recognition, convolutional neural networks, Mel-frequency cepstral coefficients (MFCC), recognition rate

中图分类号:

TP391

曾润华, 张树群. 改进卷积神经网络的语音情感识别方法[J]. 应用科学学报, 2018, 36(5): 837-844.

ZENG Run-hua, ZHANG Shu-qun. Speech and Emotional Recognition Method Based on Improving Convolutional Neural Networks[J]. Journal of Applied Sciences, 2018, 36(5): 837-844.

参考文献

[1] Anagnostopoulos C N, Iliou T, Giannoukos I. Features and classifiers for emotion recognition from speech:a survey from 2000 to 2011[J]. Artificial Intelligence Review, 2015, 43(2):155-177.
[2] Juang B H, Rabiner L. Mixture autoregressive hidden Markov models for speech signals[J]. Procedia Computer Science, 2015, 61(6):328-333.
[3] Vlassis N, Likas A. A greedy EM algorithm for Gaussian mixture learning[J]. Neural Processing Letters, 2002, 15(1):77-87.
[4] Hu H, Xu M X, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//IEEE International Conference on Acoustics, 2007:IV-413-IV-416.
[5] Lee C M, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S. Emotion recognition based on phoneme classes[J]. Proc. icslp Oct, 2004:889-892.
[6] Mao Q, Dong M, Huang Z, Zhan Y. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[7] Zhang B, Quan C, Ren F. Performance of convolution neural network on the recognition of speech emotion and images[C]//AIA International Advanced Information Institute, 2016:12-21.
[8] Zheng W Q, Yu J S, Zou Y X. An experimental study of speech emotion recognition based on deep convolutional neural networks[C]//International Conference on Affective Computing and Intelligent Interaction, 2015:827-831.
[9] 郭鹏娟. 语音情感特征提取方法和情感识别研究[D]. 西安:西北工业大学,2007.
[10] Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. Readings in Speech Recognition, 1990, 28(4):65-74.
[11] Hinton G, Deng L, Yu D, Dahl G E, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[12] Lecun Y. Convolutional networks for images, speech, and time-series[J]. Handbook of Brain Theory & Neural Networks, 1995.
[13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems, 2012:1097-1105.
[14] Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.

改进卷积神经网络的语音情感识别方法

Speech and Emotional Recognition Method Based on Improving Convolutional Neural Networks

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 10

编辑推荐

Metrics

本文评价

[1]	王孟轩, 张胜, 王月, 雷霆, 杜渂. 改进的CRNN模型在警情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3): 388-400.
[2]	马鑫, 吴云, 鹿泽光. 基于混合神经网络的协同过滤推荐模型[J]. 应用科学学报, 2020, 38(3): 478-487.
[3]	刘伟, 章琬苓, 项世军. 基于LBP-MDCT和CNN的人脸活体检测算法[J]. 应用科学学报, 2019, 37(5): 609-617.
[4]	王灿军, 廖鑫, 陈嘉欣, 秦拯, 刘绪崇. 基于卷积神经网络的面部图像修饰检测[J]. 应用科学学报, 2019, 37(5): 618-630.
[5]	吴韵清, 吴鹏, 陈北京, 鞠兴旺, 高野. 基于残差全卷积网络的图像拼接定位算法[J]. 应用科学学报, 2019, 37(5): 651-662.
[6]	靳华中, 刘潇龙, 胡梓珂. 一种结合全局和局部特征的图像描述生成模型[J]. 应用科学学报, 2019, 37(4): 501-509.
[7]	赵云山, 段友祥. 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019, 37(4): 541-550.
[8]	杨滨, 张涛, 陈先意. 基于深度学习的图像局部模糊识别[J]. 应用科学学报, 2018, 36(2): 321-330.
[9]	史晓裕, 李斌, 谭舜泉. 深度学习空域隐写分析的预处理层[J]. 应用科学学报, 2018, 36(2): 309-320.
[10]	董伟, 王建军. 改进的卷积神经网络用于对比度增强取证[J]. 应用科学学报, 2017, 35(6): 745-753.