Text categorization is an important part of natural language processing. Effective extraction of global semantics is the key to the success of text categorization. In order to emphasize the non-local importance of the extracting feature of convolutional neural networks, an A-CNN text classification model including four Attention CNN layers is established by using Attention mechanism. In the A-CNN model, the general convolution of the Attention CNN layer is used to extract local features, and the Attention mechanism is used to generate feature non-local correlation. Finally, the A-CNN model is experimentally used for the analysis on data sets such as sentiment analysis, problem classification, and question answer selection. Compared with other models, the A-CNN model improves the classification precision of the three above tasks by 1.9%, 4.3%, and 0.6%, respectively. The A-CNN model performs higher accuracy in text classification tasks and stronger versatility.
ZHAO Yunshan, DUAN Youxiang
. Convolutional Neural Networks Text Classification Model Based on Attention Mechanism[J]. Journal of Applied Sciences, 2019
, 37(4)
: 541
-550
.
DOI: 10.3969/j.issn.0255-8297.2019.04.011
[1] Mikolov T, Chen K Dean J. Efficient estimation of word representations in vector space[DB/OL]. CoRR:Computing Research Repository, 2013:1-12.
[2] Mikolov T, Sutskever I, Chen K, Dean J. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013:3111-3119.
[3] Mikolov T. Statistical language models based on neural networks[D]. Brno University of Technology, 2012.
[4] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of 2015 Conference of North American Chapter of Association for Computational Linguistics:Human Language Technologies, 2015:103-112.
[5] Lee J Y, Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks[EB/OL]. arXiv preprint arXiv:1603.03827, 2016.
[6] Yoon K. Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014:1746-1751.
[7] Nguyen H, Nguyen M L. A deep neural architecture for sentence-level sentiment classification in Twitter social networking[C]//International Conference of the Pacific Association for Computational Linguistics. Singapore:Springer, 2017:15-27.
[8] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[C]//52nd Annual Meeting of Association for Computational Linguistics. Association for Computational Linguistics. Maryland:Association for Computational Linguistics, 2014:655-665.
[9] Zhou C, Sun C, Liu Z, Lau F. A C-LSTM neural network for text classification[J]. Computer Science, 2015, 1(4):39-44.
[10] Yin W, Schutze H. Multichannel variable-size convolution for sentence classification[C]//Proceedings of Nineteenth Conference on Computational Natural Language Learning. Beijing:Association for Computational Linguistics, 2015:204-214.
[11] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]//Proceedings of 28th International Conference on Neural Information Processing Systems. USA:MIT Press, 2015:649-657.
[12] Conneau A, Schwenk H, Lecun Y, Barrault L. Very deep convolutional networks for natural language processing[EB/OL]. arXiv preprint arXiv:1606.01781, 2016.
[13] Johnson R, Zhang T. Convolutional neural networks for text categorization:shallow wordlevel vs. deep character-level[EB/OL]. arXiv preprint arXiv:1609.00718, 2016.
[14] Vaswani A, Shazeer N, Parma N, Uszkoreit J, Jones L, Gomez A N. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems Conference. California:Neural Information Processing Systems, 2017:6000-6010.
[15] Im J, Cho S. Distance-based self-attention network for natural language inference[EB/OL]. arXiv preprint arXiv:1712.02047, 2017.
[16] Yin W, Schütze H. Attentive convolution[EB/OL]. arXiv preprint, arXiv:1710.00519, 2017.
[17] Pennington J, Socher R, Manning C. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Qatar:Association for Computational Linguistics, 2014:1532-1543.
[18] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[19] Kingma D P, Ba J L. Adam:a method for stochastic optimization[EB/OL]. arXiv preprint arXiv:1412.6980, 2014.
[20] Ma M, Huang L, Xiang B, Zhou B. Group sparse CNNs for question classification with answer sets[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017(2):335-340.