考虑到人脸表情演变是一个持续过程,相比于静态图像,动态图像序列更适合作为人脸表情识别的研究对象。该文提出了一种基于嵌入网络的序列帧定位模型,利用加载预训练权重的Inception ResNet v1网络提取人脸表情序列各帧的特征向量,通过计算特征向量间的欧氏距离,定位出具有最大表情强度的完全帧,进而获取人脸表情序列数据;为了进一步验证定位模型的准确性,分别利用VGG16模型和ResNet50模型对定位的完全帧进行人脸表情识别。在CK+和MMI人脸表情数据库上进行了实验,所提的序列帧定位模型的定位平均准确率分别达到98.31%和98.08%;利用VGG16模型与ResNet50模型对定位的完全帧进行表情识别,在两个数据库上的实验结果分别达到了96.32%和96.5%,87.23%和87.88%,结果表明所提出的模型能够获取可靠的表情完全帧,并取得了令人满意的人脸表情识别效果。
Considering that the evolution of facial expressions is a continuous process, compared to static images, dynamic image sequences are more suitable as the research objects for facial expression recognition. This paper proposes a sequence frame positioning model based on embedding network. The pre-trained Inception ResNet v1 network extracts the feature vectors of each frame, and then calculates the Euclidean distance between the feature vectors to position the complete frame with the maximum expression intensity, so a standardized facial expression sequences are obtained. In order to further verify the accuracy of the positioning model, we adopt VGG16 network and ResNet50 network to perform facial expression recognition on the positioned complete frame, respectively. Experiments were conducted on the CK+ and MMI facial expression databases. The average accuracy of the sequential frame positioning model proposed in this paper reached 98.31% and 98.08%, respectively. As using the VGG16 network and ResNet50 network to perform expression recognition on the positioned complete frame, the recognition accuracies on the two databases reached 96.32% and 96.5%, 87.23% and 87.88%, respectively. These experimental results show that the proposed model can pick up the complete frame from the facial expression sequence accurately and achieve better performance on facial expression recognition as well.
[1] Sirai E A A, Aran L R, Wong F. A Review of methods in speech and facial expressions recognition for human-computer interaction[J]. Advanced Science Letters, 2017, 23(10): 10236-10240.
[2] Aung N, Tewogbola P. The impact of emotional labor on the health in the workplace: a narrative review of literature from 2013–2018[J]. AIMS public health, 2019, 6(3): 268. DOI: 10.3934/publichealth.2019.3.268
[3] Assari M A, Rahmati M. Driver drowsiness detection using face expression recognition[C]//2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA). IEEE, 2011: 337-341.
[4] 张金刚, 方圆, 袁豪, 等. 一种识别表情序列的卷积神经网络[J]. 西安电子科技大学学报, 2018, 45(1): 150-155. Zhang J G, Fang Y, Yuan H, et al. Multiple convolutional neural networks for facial expression sequence recognition[J]. Journal of Xidian University, 2018, 45(1): 150-155. (in Chinese)
[5] 王素琴, 张峰, 高宇豆, 等. 基于图像序列的学习表情识别[J]. 系统仿真学报, 2020(7): 1322-1330. Wang S Q, Zhang F, Gao Y D, et al. Learning expression recognition based on image sequence[J]. Journal of System Simulation, 2020(7): 1322-1330. (in Chinese)
[6] 王晓华, 潘丽娟, 彭穆子, 等. 基于层级注意力模型的视频序列表情识别[J]. 计算机辅助设计与图形学学报, 2020, 32(1): 27-35. Wang X H, Pan L J, Peng M Z, et al. Video emotion recognition based on hierarchical attention model[J]. Journal of Computer-Aided Design & Computer Graphics, 2020, 32(1): 27-35. (in Chinese)
[7] Yu M, Zheng H, Peng Z, et al. Facial expression recognition based on a multi-task global-local network[J]. Pattern Recognition Letters, 2020, 131: 166-171.
[8] Qiu Y, Zhao J, Wang Y. Facial expression recognition using temporal relations among facial movements[J]. Acta Electronica Sinica, 2016, 44(6): 1307-1313.
[9] 邵洁, 董楠. RGB-D动态序列的人脸自然表情识别[J]. 计算机辅助设计与图形学学报, 2015, 27(5): 847-854. Shao J, Dong N. Spontaneous facial expression recognition based on RGB-D dynamic sequences[J]. Journal of Computer-Aided Design & Computer Graphics, 2015, 27(5): 847-854. (in Chinese)
[10] Yi J, Sima Y, Zhou M, et al. Facial expression sequence interception based on feature point movement[C]//Proceedings of the 2019 IEEE 11th International Conference on Advanced Infocomm Technology (ICAIT). IEEE, Jinan, China, 2019: 58-62.
[11] Yi J, Chen A, Cai Z, et al. Facial expression recognition of intercepted video sequences based on feature point movement trend and feature block texture variation[J]. Applied Soft Computing, 2019, 82: 105540.
[12] Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition[C]//British Machine Vision Conference (BMVC), 2015, 1: 6.
[13] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[14] Chen J, Lü Y, Xu R, et al. Automatic social signal analysis: facial expression recognition using difference convolution neural network[J]. Journal of Parallel and Distributed Computing, 2019, 131: 97-102.
[15] Cai J, Meng Z, Khan A S, et al. Island loss for learning discriminative features in facial expression recognition[C]//2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018: 302-309.