应用科学学报 ›› 2025, Vol. 43 ›› Issue (3): 451-462.doi: 10.3969/j.issn.0255-8297.2025.03.007

• 计算机科学与应用 • 上一篇    

基于关键帧的频域多特征融合的Deepfake视频检测

王金伟1,2,3, 张玫瑰1, 张家伟1, 罗向阳3, 马宾4   

  1. 1. 南京信息工程大学 计算机学院、网络空间安全学院, 江苏 南京 210044;
    2. 南京信息工程大学 江苏省大气环境与装备技术协同创新中心, 江苏 南京 210044;
    3. 数学工程与高级计算国家重点实验室, 河南 郑州 450001;
    4. 齐鲁工业大学 山东省计算机网络重点实验室, 山东 济南 250353
  • 收稿日期:2022-04-11 发布日期:2025-06-23
  • 通信作者: 王金伟,教授,研究方向为信息安全。E-mail:wjwei_2004@163.com E-mail:wjwei_2004@163.com
  • 基金资助:
    国家自然科学基金(No.62472229,No.62371145,No.62172435,No.62272255,No.62302248,No.U24B20179,No.U23A20305,No.U23B2022);国家重点研发计划(No.2021QY0700);中国中原科技创新领军人才项目(No.214200510019)

Frequency-Domain Multi-feature Fusion for Deepfake Video Detection Based on Key Frames

WANG Jinwei1,2,3, ZHANG Meigui1, ZHANG Jiawei1, LUO Xiangyang3, MA Bin4   

  1. 1. School of Computer Science, School of Cyber Science and Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, Jiangsu, China;
    2. Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, Jiangsu, China;
    3. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, Henan, China;
    4. Shandong Provincial Key Laboratory of Computer Networks, Qilu University of Technology, Jinan 250353, Shandong, China
  • Received:2022-04-11 Published:2025-06-23

摘要: 现有的Deepfake视频检测方法为节约计算资源,避免数据冗余,大多随机选取视频的多帧或部分段作为检测对象,因而会降低检测对象的表征能力以及限制检测的性能。此外,现有算法在单一数据集上的检测效果良好,但在跨数据集检测时性能下降严重,泛化能力有待进一步提升。为此,提出了一种基于关键帧的频域多特征融合的Deepfake视频检测算法。利用频域的均方误差提取关键帧作为检测对象,并将频域学习主帧的伪影特征和关键帧间的时间不一致性进行融合后输入到全连接层中,从而获得最终的检测结果。实验结果表明,所提算法在跨数据集检测任务中的性能优于现有算法,具有较强的泛化性。

关键词: Deepfake 视频检测, 关键帧, 频域, 多特征融合

Abstract: To avoid data redundancy and save computing resources, most of the existing Deepfake video detection methods select multiple frames or partial segments of videos as the detection objects. However, this selection strategy compromises the representation ability of the detection objects and limits the performance. Moreover, while the existing algorithms perform well on individual datasets, their performance degrade seriously when detecting across datasets, highlighting the need for improved generalization. To address these challenges, we propose a frequency domain multi-feature fusion algorithm for Deepfake video detection based on key frames. The mean square error in frequency domain is used to extract the key frames as the detection objects. Then the artifact features of the main frame and temporal inconsistency features between the key frames are learned in frequency domain. These features are fused and passed through a fully connected layer to obtain the final detection results. Experimental results show that our algorithm achieves superior performance in cross-dataset detection compared to existing methods, showcasing strong generalization capabilities.

Key words: Deepfake video detection, key frames, frequency domain, multi-feature fusion

中图分类号: