Computer Science and Application

Convolutional Neural Networks Text Classification Model Based on Attention Mechanism

Expand
  • College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong Province, China

Received date: 2018-09-14

  Revised date: 2018-10-29

  Online published: 2019-10-11

Abstract

Text categorization is an important part of natural language processing. Effective extraction of global semantics is the key to the success of text categorization. In order to emphasize the non-local importance of the extracting feature of convolutional neural networks, an A-CNN text classification model including four Attention CNN layers is established by using Attention mechanism. In the A-CNN model, the general convolution of the Attention CNN layer is used to extract local features, and the Attention mechanism is used to generate feature non-local correlation. Finally, the A-CNN model is experimentally used for the analysis on data sets such as sentiment analysis, problem classification, and question answer selection. Compared with other models, the A-CNN model improves the classification precision of the three above tasks by 1.9%, 4.3%, and 0.6%, respectively. The A-CNN model performs higher accuracy in text classification tasks and stronger versatility.

Cite this article

ZHAO Yunshan, DUAN Youxiang . Convolutional Neural Networks Text Classification Model Based on Attention Mechanism[J]. Journal of Applied Sciences, 2019 , 37(4) : 541 -550 . DOI: 10.3969/j.issn.0255-8297.2019.04.011

References

[1] Mikolov T, Chen K Dean J. Efficient estimation of word representations in vector space[DB/OL]. CoRR:Computing Research Repository, 2013:1-12.
[2] Mikolov T, Sutskever I, Chen K, Dean J. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013:3111-3119.
[3] Mikolov T. Statistical language models based on neural networks[D]. Brno University of Technology, 2012.
[4] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of 2015 Conference of North American Chapter of Association for Computational Linguistics:Human Language Technologies, 2015:103-112.
[5] Lee J Y, Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks[EB/OL]. arXiv preprint arXiv:1603.03827, 2016.
[6] Yoon K. Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014:1746-1751.
[7] Nguyen H, Nguyen M L. A deep neural architecture for sentence-level sentiment classification in Twitter social networking[C]//International Conference of the Pacific Association for Computational Linguistics. Singapore:Springer, 2017:15-27.
[8] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[C]//52nd Annual Meeting of Association for Computational Linguistics. Association for Computational Linguistics. Maryland:Association for Computational Linguistics, 2014:655-665.
[9] Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text classification[J]. Computer Science, 2015, 1(4):39-44.
[10] Yin W, Schutze H. Multichannel variable-size convolution for sentence classification[C]//Proceedings of Nineteenth Conference on Computational Natural Language Learning. Beijing:Association for Computational Linguistics, 2015:204-214.
[11] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Proceedings of 28th International Conference on Neural Information Processing Systems. USA:MIT Press, 2015:649-657.
[12] Conneau A, Schwenk H, Lecun Y, Barrault L. Very deep convolutional networks for natural language processing[EB/OL]. arXiv preprint arXiv:1606.01781, 2016.
[13] Johnson R, Zhang T. Convolutional neural networks for text categorization:shallow wordlevel vs. deep character-level[EB/OL]. arXiv preprint arXiv:1609.00718, 2016.
[14] Vaswani A, Shazeer N, Parma N, Uszkoreit J, Jones L, Gomez A N. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems Conference. California:Neural Information Processing Systems, 2017:6000-6010.
[15] Im J, Cho S. Distance-based self-attention network for natural language inference[EB/OL]. arXiv preprint arXiv:1712.02047, 2017.
[16] Yin W, Schütze H. Attentive convolution[EB/OL]. arXiv preprint, arXiv:1710.00519, 2017.
[17] Pennington J, Socher R, Manning C. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Qatar:Association for Computational Linguistics, 2014:1532-1543.
[18] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[19] Kingma D P, Ba J L. Adam:a method for stochastic optimization[EB/OL]. arXiv preprint arXiv:1412.6980, 2014.
[20] Ma M, Huang L, Xiang B, Zhou B. Group sparse CNNs for question classification with answer sets[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017(2):335-340.
Outlines

/