基于Attention机制的卷积神经网络文本分类模型

doi:10.3969/j.issn.0255-8297.2019.04.011

应用科学学报 ›› 2019, Vol. 37 ›› Issue (4): 541-550.doi: 10.3969/j.issn.0255-8297.2019.04.011

基于Attention机制的卷积神经网络文本分类模型

赵云山, 段友祥

中国石油大学(华东) 计算机与通信工程学院, 山东青岛 266580

收稿日期:2018-09-14 修回日期:2018-10-29 出版日期:2019-07-31 发布日期:2019-10-11
通信作者: 段友祥,教授,研究方向:人工智能、图形图像处理、理论计算机科学,E-mail:yxduan@upc.edu.cn E-mail:yxduan@upc.edu.cn
基金资助:
国家科技重大专项基金（No.2017ZX05009001-09）资助

Convolutional Neural Networks Text Classification Model Based on Attention Mechanism

ZHAO Yunshan, DUAN Youxiang

College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong Province, China

Received:2018-09-14 Revised:2018-10-29 Online:2019-07-31 Published:2019-10-11

摘要/Abstract

摘要： 文本分类是自然语言处理的重要内容，而有效提取文本全局语义是成功完成分类任务的关键.为了体现卷积神经网络提取特征的非局部重要性，在模型中引入Attention机制并建立了包含4个Attention CNN层的A-CNN文本分类模型.其中，Attention CNN层中普通卷积层用于提取局部特征，Attention机制用于生成非局部相关度特征.最后，使用A-CNN模型分别在情感分析、问题分类、问题答案选择等数据集上进行了实验和对比分析.结果表明：相比于其他对比模型，A-CNN模型完成上述3个文本分类任务时的最高精度分别提高了1.9%、4.3%、0.6%，可见A-CNN模型在文本分类任务中具有较高的精度和较强的通用性.

关键词: 文本分类, 卷积神经网络, Attention机制, 非局部相关度

Abstract: Text categorization is an important part of natural language processing. Effective extraction of global semantics is the key to the success of text categorization. In order to emphasize the non-local importance of the extracting feature of convolutional neural networks, an A-CNN text classification model including four Attention CNN layers is established by using Attention mechanism. In the A-CNN model, the general convolution of the Attention CNN layer is used to extract local features, and the Attention mechanism is used to generate feature non-local correlation. Finally, the A-CNN model is experimentally used for the analysis on data sets such as sentiment analysis, problem classification, and question answer selection. Compared with other models, the A-CNN model improves the classification precision of the three above tasks by 1.9%, 4.3%, and 0.6%, respectively. The A-CNN model performs higher accuracy in text classification tasks and stronger versatility.

Key words: text categorization, convolutional neural network (CNN), Attention mechanism, non-local correlation

中图分类号:

TP391.1

赵云山, 段友祥. 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019, 37(4): 541-550.

ZHAO Yunshan, DUAN Youxiang. Convolutional Neural Networks Text Classification Model Based on Attention Mechanism[J]. Journal of Applied Sciences, 2019, 37(4): 541-550.

参考文献

[1] Mikolov T, Chen K Dean J. Efficient estimation of word representations in vector space[DB/OL]. CoRR:Computing Research Repository, 2013:1-12.
[2] Mikolov T, Sutskever I, Chen K, Dean J. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013:3111-3119.
[3] Mikolov T. Statistical language models based on neural networks[D]. Brno University of Technology, 2012.
[4] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of 2015 Conference of North American Chapter of Association for Computational Linguistics:Human Language Technologies, 2015:103-112.
[5] Lee J Y, Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks[EB/OL]. arXiv preprint arXiv:1603.03827, 2016.
[6] Yoon K. Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014:1746-1751.
[7] Nguyen H, Nguyen M L. A deep neural architecture for sentence-level sentiment classification in Twitter social networking[C]//International Conference of the Pacific Association for Computational Linguistics. Singapore:Springer, 2017:15-27.
[8] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[C]//52nd Annual Meeting of Association for Computational Linguistics. Association for Computational Linguistics. Maryland:Association for Computational Linguistics, 2014:655-665.
[9] Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text classification[J]. Computer Science, 2015, 1(4):39-44.
[10] Yin W, Schutze H. Multichannel variable-size convolution for sentence classification[C]//Proceedings of Nineteenth Conference on Computational Natural Language Learning. Beijing:Association for Computational Linguistics, 2015:204-214.
[11] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Proceedings of 28th International Conference on Neural Information Processing Systems. USA:MIT Press, 2015:649-657.
[12] Conneau A, Schwenk H, Lecun Y, Barrault L. Very deep convolutional networks for natural language processing[EB/OL]. arXiv preprint arXiv:1606.01781, 2016.
[13] Johnson R, Zhang T. Convolutional neural networks for text categorization:shallow wordlevel vs. deep character-level[EB/OL]. arXiv preprint arXiv:1609.00718, 2016.
[14] Vaswani A, Shazeer N, Parma N, Uszkoreit J, Jones L, Gomez A N. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems Conference. California:Neural Information Processing Systems, 2017:6000-6010.
[15] Im J, Cho S. Distance-based self-attention network for natural language inference[EB/OL]. arXiv preprint arXiv:1712.02047, 2017.
[16] Yin W, Schütze H. Attentive convolution[EB/OL]. arXiv preprint, arXiv:1710.00519, 2017.
[17] Pennington J, Socher R, Manning C. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Qatar:Association for Computational Linguistics, 2014:1532-1543.
[18] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[19] Kingma D P, Ba J L. Adam:a method for stochastic optimization[EB/OL]. arXiv preprint arXiv:1412.6980, 2014.
[20] Ma M, Huang L, Xiang B, Zhou B. Group sparse CNNs for question classification with answer sets[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017(2):335-340.

基于Attention机制的卷积神经网络文本分类模型

Convolutional Neural Networks Text Classification Model Based on Attention Mechanism

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价

[1]	王孟轩, 张胜, 王月, 雷霆, 杜渂. 改进的CRNN模型在警情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3): 388-400.
[2]	马鑫, 吴云, 鹿泽光. 基于混合神经网络的协同过滤推荐模型[J]. 应用科学学报, 2020, 38(3): 478-487.
[3]	刘伟, 章琬苓, 项世军. 基于LBP-MDCT和CNN的人脸活体检测算法[J]. 应用科学学报, 2019, 37(5): 609-617.
[4]	王灿军, 廖鑫, 陈嘉欣, 秦拯, 刘绪崇. 基于卷积神经网络的面部图像修饰检测[J]. 应用科学学报, 2019, 37(5): 618-630.
[5]	吴韵清, 吴鹏, 陈北京, 鞠兴旺, 高野. 基于残差全卷积网络的图像拼接定位算法[J]. 应用科学学报, 2019, 37(5): 651-662.
[6]	靳华中, 刘潇龙, 胡梓珂. 一种结合全局和局部特征的图像描述生成模型[J]. 应用科学学报, 2019, 37(4): 501-509.
[7]	冯勇, 屈渤浩, 徐红艳, 王嵘冰, 张永刚. 融合TF-IDF和LDA的中文FastText短文本分类方法[J]. 应用科学学报, 2019, 37(3): 378-388.
[8]	曾润华, 张树群. 改进卷积神经网络的语音情感识别方法[J]. 应用科学学报, 2018, 36(5): 837-844.
[9]	杨滨, 张涛, 陈先意. 基于深度学习的图像局部模糊识别[J]. 应用科学学报, 2018, 36(2): 321-330.
[10]	史晓裕, 李斌, 谭舜泉. 深度学习空域隐写分析的预处理层[J]. 应用科学学报, 2018, 36(2): 309-320.
[11]	董伟, 王建军. 改进的卷积神经网络用于对比度增强取证[J]. 应用科学学报, 2017, 35(6): 745-753.
[12]	丁泽亚1,2，张全1. 利用概念知识的文本分类[J]. 应用科学学报, 2013, 31(2): 197-203.
[13]	刘海峰;姚泽清;刘守生;王倩 . 文本分类中基于核的非线性判别[J]. 应用科学学报, 2008, 26(6): 627-631 .
[14]	忻健, 陆巍, 朱景德, 王翼飞. GenExtractor:一个基于Web的生物信息挖掘系统[J]. 应用科学学报, 2005, 23(1): 75-81.