应用科学学报 ›› 2021, Vol. 39 ›› Issue (3): 357-356.doi: 10.3969/j.issn.0255-8297.2021.03.002

• CCF NCCA 2020专栏 • 上一篇    

动态人脸图像序列中表情完全帧的定位与识别

司马懿1,3, 易积政1,2,3, 陈爱斌1,2,3, 周孟娜1,3   

  1. 1. 中南林业科技大学 人工智能应用研究所, 湖南 长沙 410004;
    2. 中南林业科技大学 智慧物流技术湖南省重点实验室, 湖南 长沙 410004;
    3. 中南林业科技大学 计算机与信息工程学院, 湖南 长沙 410004
  • 收稿日期:2020-08-20 发布日期:2021-06-08
  • 通信作者: 易积政,副教授,研究方向为图像处理、模式识别、人工智能、深度学习、医学图像分析等。E-mail:kingkong148@163.com E-mail:kingkong148@163.com
  • 基金资助:
    国家自然科学基金青年科学基金(No.61602528);湖南省自然科学基金青年基金(No.2017JJ3527);中南林业科技大学高水平人才引进基金(No.2015YJ013)资助

Fully Expression Frame Localization and Recognition Based on Dynamic Face Image Sequences

SIMA Yi1,3, YI Jizheng1,2,3, CHEN Aibin1,2,3, ZHOU Mengna1,3   

  1. 1. Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China;
    2. Hunan Key Laboratory of Intelligent Logistics Technology, Central South University of Forestry and Technology, Changsha 410004, Hunan, China;
    3. School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, Hunan, China
  • Received:2020-08-20 Published:2021-06-08

摘要: 考虑到人脸表情演变是一个持续过程,相比于静态图像,动态图像序列更适合作为人脸表情识别的研究对象。该文提出了一种基于嵌入网络的序列帧定位模型,利用加载预训练权重的Inception ResNet v1网络提取人脸表情序列各帧的特征向量,通过计算特征向量间的欧氏距离,定位出具有最大表情强度的完全帧,进而获取人脸表情序列数据;为了进一步验证定位模型的准确性,分别利用VGG16模型和ResNet50模型对定位的完全帧进行人脸表情识别。在CK+和MMI人脸表情数据库上进行了实验,所提的序列帧定位模型的定位平均准确率分别达到98.31%和98.08%;利用VGG16模型与ResNet50模型对定位的完全帧进行表情识别,在两个数据库上的实验结果分别达到了96.32%和96.5%,87.23%和87.88%,结果表明所提出的模型能够获取可靠的表情完全帧,并取得了令人满意的人脸表情识别效果。

关键词: 人脸表情序列, 嵌入网络, 完全帧定位, 特征向量, 人脸表情识别

Abstract: Considering that the evolution of facial expressions is a continuous process, compared to static images, dynamic image sequences are more suitable as the research objects for facial expression recognition. This paper proposes a sequence frame positioning model based on embedding network. The pre-trained Inception ResNet v1 network extracts the feature vectors of each frame, and then calculates the Euclidean distance between the feature vectors to position the complete frame with the maximum expression intensity, so a standardized facial expression sequences are obtained. In order to further verify the accuracy of the positioning model, we adopt VGG16 network and ResNet50 network to perform facial expression recognition on the positioned complete frame, respectively. Experiments were conducted on the CK+ and MMI facial expression databases. The average accuracy of the sequential frame positioning model proposed in this paper reached 98.31% and 98.08%, respectively. As using the VGG16 network and ResNet50 network to perform expression recognition on the positioned complete frame, the recognition accuracies on the two databases reached 96.32% and 96.5%, 87.23% and 87.88%, respectively. These experimental results show that the proposed model can pick up the complete frame from the facial expression sequence accurately and achieve better performance on facial expression recognition as well.

Key words: facial expression sequence, embedding network, fully frame position, feature vector, facial expression recognition

中图分类号: