计算机科学与应用

基于Attention机制的卷积神经网络文本分类模型

展开
  • 中国石油大学(华东) 计算机与通信工程学院, 山东 青岛 266580

收稿日期: 2018-09-14

  修回日期: 2018-10-29

  网络出版日期: 2019-10-11

基金资助

国家科技重大专项基金(No.2017ZX05009001-09)资助

Convolutional Neural Networks Text Classification Model Based on Attention Mechanism

Expand
  • College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong Province, China

Received date: 2018-09-14

  Revised date: 2018-10-29

  Online published: 2019-10-11

摘要

文本分类是自然语言处理的重要内容,而有效提取文本全局语义是成功完成分类任务的关键.为了体现卷积神经网络提取特征的非局部重要性,在模型中引入Attention机制并建立了包含4个Attention CNN层的A-CNN文本分类模型.其中,Attention CNN层中普通卷积层用于提取局部特征,Attention机制用于生成非局部相关度特征.最后,使用A-CNN模型分别在情感分析、问题分类、问题答案选择等数据集上进行了实验和对比分析.结果表明:相比于其他对比模型,A-CNN模型完成上述3个文本分类任务时的最高精度分别提高了1.9%、4.3%、0.6%,可见A-CNN模型在文本分类任务中具有较高的精度和较强的通用性.

本文引用格式

赵云山, 段友祥 . 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019 , 37(4) : 541 -550 . DOI: 10.3969/j.issn.0255-8297.2019.04.011

Abstract

Text categorization is an important part of natural language processing. Effective extraction of global semantics is the key to the success of text categorization. In order to emphasize the non-local importance of the extracting feature of convolutional neural networks, an A-CNN text classification model including four Attention CNN layers is established by using Attention mechanism. In the A-CNN model, the general convolution of the Attention CNN layer is used to extract local features, and the Attention mechanism is used to generate feature non-local correlation. Finally, the A-CNN model is experimentally used for the analysis on data sets such as sentiment analysis, problem classification, and question answer selection. Compared with other models, the A-CNN model improves the classification precision of the three above tasks by 1.9%, 4.3%, and 0.6%, respectively. The A-CNN model performs higher accuracy in text classification tasks and stronger versatility.

参考文献

[1] Mikolov T, Chen K Dean J. Efficient estimation of word representations in vector space[DB/OL]. CoRR:Computing Research Repository, 2013:1-12.
[2] Mikolov T, Sutskever I, Chen K, Dean J. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013:3111-3119.
[3] Mikolov T. Statistical language models based on neural networks[D]. Brno University of Technology, 2012.
[4] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of 2015 Conference of North American Chapter of Association for Computational Linguistics:Human Language Technologies, 2015:103-112.
[5] Lee J Y, Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks[EB/OL]. arXiv preprint arXiv:1603.03827, 2016.
[6] Yoon K. Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014:1746-1751.
[7] Nguyen H, Nguyen M L. A deep neural architecture for sentence-level sentiment classification in Twitter social networking[C]//International Conference of the Pacific Association for Computational Linguistics. Singapore:Springer, 2017:15-27.
[8] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[C]//52nd Annual Meeting of Association for Computational Linguistics. Association for Computational Linguistics. Maryland:Association for Computational Linguistics, 2014:655-665.
[9] Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text classification[J]. Computer Science, 2015, 1(4):39-44.
[10] Yin W, Schutze H. Multichannel variable-size convolution for sentence classification[C]//Proceedings of Nineteenth Conference on Computational Natural Language Learning. Beijing:Association for Computational Linguistics, 2015:204-214.
[11] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Proceedings of 28th International Conference on Neural Information Processing Systems. USA:MIT Press, 2015:649-657.
[12] Conneau A, Schwenk H, Lecun Y, Barrault L. Very deep convolutional networks for natural language processing[EB/OL]. arXiv preprint arXiv:1606.01781, 2016.
[13] Johnson R, Zhang T. Convolutional neural networks for text categorization:shallow wordlevel vs. deep character-level[EB/OL]. arXiv preprint arXiv:1609.00718, 2016.
[14] Vaswani A, Shazeer N, Parma N, Uszkoreit J, Jones L, Gomez A N. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems Conference. California:Neural Information Processing Systems, 2017:6000-6010.
[15] Im J, Cho S. Distance-based self-attention network for natural language inference[EB/OL]. arXiv preprint arXiv:1712.02047, 2017.
[16] Yin W, Schütze H. Attentive convolution[EB/OL]. arXiv preprint, arXiv:1710.00519, 2017.
[17] Pennington J, Socher R, Manning C. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Qatar:Association for Computational Linguistics, 2014:1532-1543.
[18] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[19] Kingma D P, Ba J L. Adam:a method for stochastic optimization[EB/OL]. arXiv preprint arXiv:1412.6980, 2014.
[20] Ma M, Huang L, Xiang B, Zhou B. Group sparse CNNs for question classification with answer sets[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017(2):335-340.
文章导航

/