基于多任务学习的课堂表情分类模型

贺加贝, 周菊香, 甘健侯, 吴迪, 温晓宇

doi:10.3969/j.issn.0255-8297.2024.06.005

应用科学学报 >

2024 , Vol. 42 >Issue 6: 947 - 961

DOI: https://doi.org/10.3969/j.issn.0255-8297.2024.06.005

信号与信息处理

基于多任务学习的课堂表情分类模型

展开

1. 云南师范大学民族教育信息化教育部重点实验室, 云南昆明 650500;
2. 云南师范大学云南省智慧教育重点实验室, 云南昆明 650500

收稿日期: 2023-03-09

网络出版日期: 2024-11-30

基金资助

国家自然科学基金（No.62107034）；云南省科技计划基础研究专项（No.202101AT070095）资助

收起

Classroom Expression Classification Model Based on Multitask Learning

Expand

1. Key Laboratory of Education Informatization for Nationalities, Ministry of Education, Yunnan Normal University, Kunming 650500, Yunnan, China;
2. Yunnan Key Laboratory of Smart Education, Yunnan Normal University, Kunming 650500, Yunnan, China

Received date: 2023-03-09

Online published: 2024-11-30

Fold

摘要

基于课堂视频图像理解的学生表情识别与学习情感分析技术已然成为当前智慧教育领域的研究热点，但在图像视频采集质量低、环境复杂、多目标遮挡严重的真实应用场景中往往面临很大的挑战。针对目前大多课堂表情分类模型仅关注离散表情单一维度存在的不足，提出了一种多任务识别的学生表情分类模型。首先，基于课堂表情分类模型构建了真实场景下的多任务课堂表情数据集，并通过数据平衡技术解决数据集类别标签分布不平衡问题。其次，提出了一种基于多任务学习的课堂表情分类模型，通过引入知识蒸馏并设计双通道融合机制，有效融合离散表情、人脸动作单元和效价-唤醒三个表情识别任务，利用多任务之间的关系进一步增强离散表情分类任务的性能。最后，该方法在多个数据集上与现有先进方法进行了实验对比分析，结果表明所提模型能够有效提升表情分类精度，且在课堂表情多任务识别中具有优越表现，为实现课堂情感多维度评估分析提供技术支持。

关键词： 深度学习; 表情识别; 情感计算; 课堂表情

本文引用格式

贺加贝, 周菊香, 甘健侯, 吴迪, 温晓宇 . 基于多任务学习的课堂表情分类模型[J]. 应用科学学报, 2024 , 42(6) : 947 -961 . DOI: 10.3969/j.issn.0255-8297.2024.06.005

Abstract

Facial expression recognition and learning sentiment analysis based on classroom video image understanding have become research hotspots in smart education. However, these applications often face great challenges in real-world scenarios with low-quality image and video acquisition, and serious multi-target occlusion in complex environments. In this paper, a multitask recognition model for classifying student expressions is proposed. Firstly, this study constructs a multitask classroom expression dataset and effectively alleviates the imbalance of class label distribution in the dataset. Secondly, a classroom expression classification model based on multitask learning is proposed. By introducing knowledge distillation and designing a dual-channel fusion mechanism, the model effectively integrates the three tasks of discrete expression recognition, facial action unit detection and valence-arousal estimation. This integration leverages the relationship between multitasks to further enhance the performance of discrete expression classification. Finally, the proposed method is compared with the existing advanced methods across multiple datasets. Results show that the proposed model effectively improves the accuracy of expression classification, and demonstrates superior performance in the multitask recognition of classroom expressions, which provides technical support for multi-dimensional evaluation and analysis of classroom emotions.

Key words： deep learning; expression recognition; affective computing; classroom expression

参考文献

[1] 李泽林, 陈虹琴. 人工智能对教学的解放与奴役——兼论教学发展的现代性危机[J]. 电化教育研究, 2020, 41(1): 115-121. Li Z L, Chen H Q. The emancipation and slavery of artificial intelligence to teaching: on modern crisis in teaching development [J]. e-Education Research, 2020, 41(1): 115-121. (in Chinese)
[2] 贾宁, 郑纯军. 融合音频、文本、表情动作的多模态情感识别[J]. 应用科学学报, 2023, 41(1): 55-70. Jia N, Zheng C J. Multi-modal emotion recognition using speech, text and motion [J]. Journal of Applied Sciences, 2023, 41(1): 55-70. (in Chinese)
[3] 刘思进, 朱小飞, 彭展望. 联合多任务学习的对话情感分类和行为识别[J]. 计算机学报, 2023, 46(9): 1947-1960. Liu S J, Zhu X F, Peng Z W. Dialogue sentiment classification and act recognition based on multi-task learning [J]. Chinese Journal of Computers, 2023, 46(9): 1947-1960. (in Chinese)
[4] 王楠, 王淇. 基于深度学习的学生课堂专注度测评方法[J]. 数据分析与知识发现, 2023, 7(6): 123-133. Wang N, Wang Q. Evaluation method of student engagement based on deep learning [J]. Data Analysis and Knowledge Discovery, 2023, 7(6): 123-133. (in Chinese)
[5] 郦泽坤, 苏航, 陈美月, 等. 支持MOOC课程的动态表情识别算法[J]. 微型计算机系统, 2017, 38(9): 2096-2100. Li Z K, Su H, Chen M Y, et al. Dynamic facial expression recognition algorithm for massive open online courses [J]. Journal of Chinese Computer Systems, 2017, 38(9): 2096-2100. (in Chinese)
[6] Ekman P. Differential communication of affect by head and body cues [J]. Journal of Personality and Social Psychology, 1965, 2(5): 726-735.
[7] Ekman P E, Friesen W V. Facial action coding system (FACS) [J]. Environmental Psychology and Nonverbal Behavior, 1976, 1(1): 56-75.
[8] Gunes H, Pantic M, Automatic, dimensional and continuous emotion recognition [J]. International Journal of Synthetic Emotions, 2010, 1(1): 68-99.
[9] Deng D D, Chen Z K, Shi B E. Multitask emotion recognition with incomplete labels [C]//15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020: 592-599.
[10] Martinez B, Valstar M F, Jiang B H, et al. Automatic analysis of facial actions: a survey [J]. IEEE Transactions on Affective Computing, 2019, 10(3): 325-347.
[11] Zhi R C, Liu M Y, Zhang D Z. A comprehensive survey on automatic facial action unit analysis [J]. The Visual Computer, 2020, 36(5): 1067-1093.
[12] 李冠彬, 张锐斐, 朱鑫, 等. 语义关系引导的面部动作单元分析[J]. 软件学报, 2023, 34(6): 2922- 2941. Li G B, Zhang R F, Zhu X, et al. Semantic relationships guided facial action unit analysis [J]. Journal of Software, 2023, 34(6): 2922-2941. (in Chinese)
[13] Toisoul A, Kossaifi J, Bulat A, et al. Estimation of continuous valence and arousal levels from faces in naturalistic conditions [J]. Nature Machine Intelligence, 2021, 3: 42-50.
[14] Siirtola P, Tamminen S, Chandra G, et al. Predicting emotion with biosignals: a comparison of classification and regression models for estimating valence and arousal level using wearable sensors [J]. Sensors, 2023, 23(3): 1598.
[15] Qi D L, Tan W J, Yao Q, et al. YOLO5Face: why reinventing a face detector [DB/OL]. 2021[2023-03-09]. http://arxiv.org/abs/2105.12931.
[16] Mavadati S M, Mahoor M H, Bartlett K, et al. DISFA: a spontaneous facial action intensity database [J]. IEEE Transactions on Affective Computing, 2013, 4(2): 151-160.
[17] Charte F, Rivera A J, del Jesus M J, et al. Addressing imbalance in multilabel classification: measures and random resampling algorithms [J]. Neurocomputing, 2015, 163: 3-16.
[18] Kossaifi J, Tzimiropoulos G, Todorovic S, et al. AFEW-VA database for valence and arousal estimation in-the-wild [J]. Image and Vision Computing, 2017, 65: 23-36.
[19] Kingma D P, Ba J. Adam: a method for stochastic optimization [DB/OL]. 2014[2023-03-09]. https://arxiv.org/abs/1412.6980.
[20] Lucey P, Cohn J F, Kanade T, et al. The extended Cohn-Kanade dataset (CK): a complete dataset for action unit and emotion-specified expression [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010: 94-101.
[21] Goodfellow I J, Erhan D, Carrier P L, et al. Challenges in representation learning: a report on three machine learning contests [C]//20th International Conference on Neural Information Processing, 2013: 117-124.
[22] 马中启, 朱好生, 杨海仕, 等. 基于多特征融合密集残差CNN的人脸表情识别[J]. 计算机应用与软件, 2019, 36(7): 197-201. Ma Z Q, Zhu H S, Yang H S, et al. Facial expression recognition based on multi-feature fusion dense residual CNN [J]. Computer Applications and Software, 2019, 36(7): 197-201. (in Chinese)
[23] Zhang T, Zheng W M, Cui Z, et al. Spatial-temporal recurrent neural network for emotion recognition [J]. IEEE Transactions on Cybernetics, 2019, 49(3): 839-847.
[24] Fei Z X, Yang E F, Li D U, et al. Deep convolution network based emotion analysis towards mental health care [J]. Neurocomputing, 2020, 388: 212-227.
[25] 徐琳琳, 张树美, 赵俊莉. 构建并行卷积神经网络的表情识别算法[J]. 中国图象图形学报, 2019, 24(2): 227-236. Xu L L, Zhang S M, Zhao J L. Expression recognition algorithm for parallel convolutional neural networks [J]. Journal of Image and Graphics, 2019, 24(2): 227-236. (in Chinese)
[26] 孙晓, 丁小龙. 基于生成对抗网络的人脸表情数据增强方法[J]. 计算机工程与应用, 2020, 56(4): 115-121. Sun X, Ding X L. Data augmentation method based on generative adversarial networks for facial expression recognition sets [J]. Computer Engineering and Applications, 2020, 56(4): 115- 121. (in Chinese)
[27] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 2261-2269.
[28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [DB/OL]. 2014[2023-03-09]. https://arxiv.org/abs/1409.1556.
[29] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 2818- 2826.
[30] Hung J C, Lin K C, Lai N X. Recognizing learning emotion based on convolutional neural networks and transfer learning [J]. Applied Soft Computing, 2019, 84: 105724.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献