Journal of Applied Sciences ›› 2023, Vol. 41 ›› Issue (1): 55-70.doi: 10.3969/j.issn.0255-8297.2023.01.005

• Special Issue on Computer Applications • Previous Articles     Next Articles

Multi-modal Emotion Recognition Using Speech, Text and Motion

JIA Ning, ZHENG Chunjun   

  1. School of Software, Dalian Neusoft University of Information, Dalian 116023, Liaoning, China
  • Received:2022-06-18 Online:2023-01-31 Published:2023-02-03

Abstract: For the problems of low accuracy and weak generalization ability in the process of human emotion recognition, a fusion method of multi-modal emotion recognition based on speech, text and motion is proposed. In the speech mode, a depth wavefield extrapolation-improved wave physics model (DWE-WPM) is designed to simulate the sequence information mining process of long short-term memory (LSTM) network; In the text mode, a transformer model with multi-attention mechanism is used to capture the potential semantic expression of emotion; In the motion mode, sequential features of facial expression and hand action are combined by using two-way three-layer LSTM model with attention mechanism. Accordingly, a multi-modal fusion scheme is designed to achieve high-precision and strong generalization ability of emotion recognition. In the general emotion corpus IEMOCAP, the method proposed in this paper is compared with existing emotion recognition algorithms. Experimental results show that the proposed method has higher recognition accuracy both in single modality and multi-modals, with average accuracy improved by 16.4% and 10.5% respectively, effectively improving the ability of human emotion recognition in human-computer interaction.

Key words: speech emotion recognition, text emotion recognition, motion emotion recognition, Transformer model, attention mechanism

CLC Number: