Special Issue on Computer Applications

Static Multimodal Sentiment Analysis of Online Reviews

Expand
  • 1. School of Software Technology, Dalian University of Technology, Dalian 116620, Liaoning, China;
    2. Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian 116620, Liaoning, China

Received date: 2021-07-25

  Online published: 2022-01-28

Abstract

This paper proposes a static multi-modal sentiment classification model based on Pre-LN Transformer. This model firstly extracts semantic features from reviews using the encoder in Pre-LN Transformer structure, in which the multi-head self-attention mechanism allows the model to learn relevant emotional information in different subspaces. Then our model extracts the image features according to ResNet in the reviews. On the basis of feature level fusion, the visual attention mechanism guides the sentiment classification of text and realizes the static multimodal sentiment analysis of online reviews. Experimental results show that our model improves the performance by 1.34% and 1.10% in evaluation accuracy than BiGRU-mVGG and Trans-mVGG on Yelp datasets, which verifies the effectiveness and feasibility of the proposed model.

Cite this article

WANG Kaixin, XU Xiujuan, LIU Yu, ZHAO Zhehuan, ZHAO Xiaowei . Static Multimodal Sentiment Analysis of Online Reviews[J]. Journal of Applied Sciences, 2022 , 40(1) : 25 -35 . DOI: 10.3969/j.issn.0255-8297.2022.01.003

References

[1] 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5):426-438. Zhang Y Z, Rong L, Song D W, et al. A survey on multimodal sentiment analysis[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(5):426-438. (in Chinese)
[2] 潘家辉, 何志鹏, 李自娜, 等. 多模态情绪识别研究综述[J]. 智能系统学报, 2020, 84(4):7-19. Pan J H, He Z P, Li Z N, et al. A review of multimodal emotion recognition[J]. CAAI Transactions on Intelligent Systems, 2020, 84(4):7-19. (in Chinese)
[3] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017:1103-1114.
[4] Chen M H, Wang S, Liang P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]//Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 2017:163-171.
[5] Cao D, Ji R, Lin D, et al. A cross-media public sentiment analysis system for microblog[J]. Multimedia Systems, 2016, 22(4):479-486.
[6] Yu Y, Lin H, Meng J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2):41-52.
[7] Li Z H, Fan Y Y, Liu W H, et al. Image sentiment prediction based on textual descriptions with adjective noun pairs[J]. Multimedia Tools and Application, 2018, 77(1):1115-1132.
[8] 蔡国永, 夏彬彬. 基于卷积神经网络的图文融合媒体情感预测[J]. 计算机应用, 2016, 36(2):428-431. Cai G Y, Xia B B. Multimedia sentiment analysis based on convolutional neural network[J]. Journal of Computer Applications, 2016, 36(2):428-431. (in Chinese)
[9] Xu N, Mao W J. MultiSentiNet:a deep semantic network for multimodal sentiment analysis[C]//Proceedings of 2017 ACM on Conference on Information and Knowledge Management, Singapore, 2017:2399-2402.
[10] Truong T Q, Lauw H W. VistaNet:visual aspect attention network for multimodal sentiment analysis[C]//Proceedings of AAAI Conference on Artificial Intelligence, Hawaii, USA, 2019:305-312.
[11] Huang F, Zhang X, Zhao Z, et al. Image-text sentiment analysis via deep multimodal attentive fusion[J]. Knowledge-Based Systems, 2019, 167:26-37.
[12] 林敏鸿, 蒙祖强. 基于注意力神经网络的多模态情感分析[J]. 计算机科学, 2020, 47(增刊2):518-524, 558. Lin M H, Meng Z Q. Multimodal sentiment analysis based on attention neural network[J]. Computer Science, 2020, 47(Suppl. 2):518-524, 558. (in Chinese)
[13] Tang G, Müller M, Rios A, et al. Why self-attention? a targeted evaluation of neural machine translation architectures[C]//Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018:4263-4272.
[14] Vaswani A, Bengio S, Brevdo E, et al. Tensor2Tensor for neural machine translation[C]//Proceedings of the 13th Conference of the Association for Machine Translation in Americas, Boston, USA, 2018:193-199.
[15] Wang Q, Li B, Xiao T, et al. Learning deep transformer models for machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019:1810-1822.
[16] Xiong R, Yang Y, He D, et al. On layer normalization in the transformer architecture[C]//Proceedings of the Thirty-Seventh International Conference on Machine Learning, Virtual, 2020:10524-10533.
[17] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015:1-14.
[18] Pennington J, Socher R, Manning C D. Glove:global vectors for word representation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014:1532-1543.
[19] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016:770-778.
Outlines

/