基于关键帧的频域多特征融合的Deepfake视频检测

doi:10.3969/j.issn.0255-8297.2025.03.007

应用科学学报 ›› 2025, Vol. 43 ›› Issue (3): 451-462.doi: 10.3969/j.issn.0255-8297.2025.03.007

• 计算机科学与应用 • 上一篇

基于关键帧的频域多特征融合的Deepfake视频检测

王金伟^1,2,3, 张玫瑰¹, 张家伟¹, 罗向阳³, 马宾⁴

1. 南京信息工程大学计算机学院、网络空间安全学院, 江苏南京 210044;
2. 南京信息工程大学江苏省大气环境与装备技术协同创新中心, 江苏南京 210044;
3. 数学工程与高级计算国家重点实验室, 河南郑州 450001;
4. 齐鲁工业大学山东省计算机网络重点实验室, 山东济南 250353

收稿日期:2022-04-11 发布日期:2025-06-23
通信作者: 王金伟，教授，研究方向为信息安全。E-mail:wjwei_2004@163.com E-mail:wjwei_2004@163.com
基金资助:
国家自然科学基金（No.62472229,No.62371145,No.62172435,No.62272255,No.62302248,No.U24B20179,No.U23A20305,No.U23B2022）；国家重点研发计划（No.2021QY0700）；中国中原科技创新领军人才项目（No.214200510019）

Frequency-Domain Multi-feature Fusion for Deepfake Video Detection Based on Key Frames

WANG Jinwei^1,2,3, ZHANG Meigui¹, ZHANG Jiawei¹, LUO Xiangyang³, MA Bin⁴

1. School of Computer Science, School of Cyber Science and Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, Jiangsu, China;
2. Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, Jiangsu, China;
3. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, Henan, China;
4. Shandong Provincial Key Laboratory of Computer Networks, Qilu University of Technology, Jinan 250353, Shandong, China

Received:2022-04-11 Published:2025-06-23

摘要/Abstract

摘要： 现有的Deepfake视频检测方法为节约计算资源，避免数据冗余，大多随机选取视频的多帧或部分段作为检测对象，因而会降低检测对象的表征能力以及限制检测的性能。此外，现有算法在单一数据集上的检测效果良好，但在跨数据集检测时性能下降严重，泛化能力有待进一步提升。为此，提出了一种基于关键帧的频域多特征融合的Deepfake视频检测算法。利用频域的均方误差提取关键帧作为检测对象，并将频域学习主帧的伪影特征和关键帧间的时间不一致性进行融合后输入到全连接层中，从而获得最终的检测结果。实验结果表明，所提算法在跨数据集检测任务中的性能优于现有算法，具有较强的泛化性。

关键词: Deepfake 视频检测, 关键帧, 频域, 多特征融合

Abstract: To avoid data redundancy and save computing resources, most of the existing Deepfake video detection methods select multiple frames or partial segments of videos as the detection objects. However, this selection strategy compromises the representation ability of the detection objects and limits the performance. Moreover, while the existing algorithms perform well on individual datasets, their performance degrade seriously when detecting across datasets, highlighting the need for improved generalization. To address these challenges, we propose a frequency domain multi-feature fusion algorithm for Deepfake video detection based on key frames. The mean square error in frequency domain is used to extract the key frames as the detection objects. Then the artifact features of the main frame and temporal inconsistency features between the key frames are learned in frequency domain. These features are fused and passed through a fully connected layer to obtain the final detection results. Experimental results show that our algorithm achieves superior performance in cross-dataset detection compared to existing methods, showcasing strong generalization capabilities.

Key words: Deepfake video detection, key frames, frequency domain, multi-feature fusion

中图分类号:

TP391.4

王金伟, 张玫瑰, 张家伟, 罗向阳, 马宾. 基于关键帧的频域多特征融合的Deepfake视频检测[J]. 应用科学学报, 2025, 43(3): 451-462.

WANG Jinwei, ZHANG Meigui, ZHANG Jiawei, LUO Xiangyang, MA Bin. Frequency-Domain Multi-feature Fusion for Deepfake Video Detection Based on Key Frames[J]. Journal of Applied Sciences, 2025, 43(3): 451-462.

参考文献

[1] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]//2017 IEEE International Conference on Computer Vision, 2017: 2242- 2251.
[2] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks [DB/OL]. (2014-06-10) [2022-04-11]. https://arxiv.org/abs/1406.2661v1.
[3] Ji Z, Yan J T, Wang Q, et al. Triple discriminator generative adversarial network for zero-shot image classification [J]. Science China Information Sciences, 2021, 64(2): 120101.
[4] Zhao H Q, Wei T Y, Zhou W B, et al. Multi-attentional deepfake detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2185-2194.
[5] Zhou P, Han X T, Morariu V I, et al. Two-stream neural networks for tampered face detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017: 1831-1839.
[6] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions [C]//2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[7] Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering [C]//2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 815-823.
[8] Burges C. A tutorial on support vector machines for pattern recognition [J]. Data Mining and Knowledge Discovery, 1998, 2: 121-167.
[9] Afchar D, Nozick V, Yamagishi J, et al. MesoNet: a compact facial video forgery detection network [C]//2018 IEEE International Workshop on Information Forensics and Security, 2018: 1-7.
[10] Li Y Z, Lyu S W. Exposing DeepFake videos by detecting face warping artifacts [DB/OL]. (2018-11-01) [2022-04-11]. https://arxiv.org/abs/1811.00656v3.
[11] Nguyen H H, Yamagishi J, Echizen I. Capsule-forensics: using capsule networks to detect forged images and videos [C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing, 2019: 2307-2311.
[12] Li Y Z, Chang M C, Lyu S W. In ictu oculi: exposing AI created fake videos by detecting eye blinking [C]//201810th IEEE International Workshop on Information Forensics and Security, 2018: 1-7.
[13] Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural Computation, 1997, 9: 1735-1780.
[14] Güera D, Delp E J. Deepfake video detection using recurrent neural networks [C]//201815th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2018: 127-132.
[15] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 2818-2826.
[16] Sabir E, Cheng J X, Jaiswal A, et al. Recurrent convolutional strategies for face manipulation detection in videos [DB/OL]. (2019-05-16) [2022-04-11]. https://arxiv.org/abs/1905.00582.
[17] Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of regression trees [C]//2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 1867- 1874.
[18] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks [C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 2261- 2269.
[19] Schuster M, Paliwal K K. Bidirectional recurrent neural networks [J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
[20] Durall R, Keuper M, Pfreundt F J, et al. Unmasking DeepFakes with simple features [DB/OL]. (2020-03-04) [2022-04-11]. https://arxiv.org/abs/1911.00686v3.
[21] Chen S, Yao T P, Chen Y, et al. Local relation learning for face forgery detection [C]//35th AAAI Conference on Artificial Intelligence/33rd Conference on Innovative Applications of Artificial Intelligence/11th Symposium on Educational Advances in Artificial Intelligence, 2021: 1081-1088.
[22] Qian Y Y, Yin G J, Sheng L, et al. Thinking in frequency: face forgery detection by mining frequency aware clues [C]//European Conference on Computer Vision, 2020: 86-103.
[23] Liu H G, Li X D, Zhou W B, et al. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 772-781.
[24] Jang E, Gu S X, Poole B. Categorical reparameterization with Gumbel-Softmax [DB/OL]. (2016-11-03) [2022-04-11]. https://arxiv.org/abs/1611.01144v5.
[25] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[26] Dey R, Salem F M. Gate-variants of gated recurrent unit (GRU) neural networks [C]//2017 IEEE 60th International Midwest Symposium on Circuits and Systems, 2017: 1597-1600.
[27] R?ssler A, Cozzolino D, Verdoliva L, et al. FaceForensics plus plus: learning to detect manipulated facial images [C]//2019 IEEE International Conference on Computer Vision, 2019: 1-11.
[28] Li Y Z, Yang X, Sun P, et al. Celeb-DF: a large-scale challenging dataset for DeepFake forensics [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3204-3213.
[29] Korshunov P, Marcel S. DeepFakes: a new threat to face recognition? assessment and detection [DB/OL]. (2018-12-20) [2022-04-11]. https://arxiv.org/abs/1812.08685v1.
[30] Thies J, Zollh?fer M, Stamminger M, et al. Face2Face: real-time face capture and reenactment of RGB videos [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 2387-2395.
[31] Liu Z W, Luo P, Wang X G, et al. Deep learning face attributes in the wild [C]//2015 IEEE International Conference on Computer Vision, 2015: 3730-3738.
[32] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules [C]//31st Annual Conference on Neural Information Processing Systems, 2017: 3859-3869.
[33] Chollet F. Xception: deep learning with depthwise separable convolutions [C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1800-1807.
[34] Wu X, Xie Z, Gao Y T, et al. SSTNet: detecting manipulated faces through spatial, steganalysis and temporal features [C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 2952-2956.
[35] Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of StyleGAN [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8107-8116.
[36] Nguyen H H, Fang F M, Yamagishi J, et al. Multi-task learning for detecting and segmenting manipulated facial images and videos [DB/OL]. (2019-06-17) [2022-04-11]. https://arxiv.org/abs/1906.06876v1.
[37] Masi I, Killekar A, Mascarenhas R M, et al. Two-branch recurrent network for isolating deepfakes in videos [C]//European Conference on Computer Vision, 2020: 667-684.

基于关键帧的频域多特征融合的Deepfake视频检测

Frequency-Domain Multi-feature Fusion for Deepfake Video Detection Based on Key Frames

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics

本文评价

[1]	桂鑫, 李政颖, 王洪海, 王立新, 郭会勇. 基于大规模光栅阵列光纤的分布式传感技术及应用综述[J]. 应用科学学报, 2021, 39(5): 747-776.
[2]	余鑫, 徐争光. 基于单边带调制的隐蔽语音传输方案[J]. 应用科学学报, 2019, 37(5): 643-650.
[3]	刘立昕，卞红雨. 用于水下目标跟踪的多特征融合PSOPF 算法[J]. 应用科学学报, 2013, 31(6): 564-568.
[4]	牛家晓, 张祺, 周希朗, 单志勇. 梯形脊形金属波导传输特性的二维频域有限差分法分析[J]. 应用科学学报, 2006, 24(6): 569-572.
[5]	杨绿溪, 钱轶群, 杨兵. 一种用于频率选择性信道的高速率单载波空时分组码系统[J]. 应用科学学报, 2005, 23(6): 573-576.
[6]	安翔, 吕志清, 洪伟, 崔铁军, 殷晓星. PBSV-DDM在电大尺寸柱体电磁散射中的应用[J]. 应用科学学报, 2005, 23(2): 122-125.
[7]	许锋, 洪伟, 周后型. 频域有限差分法在二维周期导波结构中的应用[J]. 应用科学学报, 2003, 21(2): 205-208.
[8]	李玉芳, 夏冠群, 孙晓玮. 频域动态压缩方法在FMCW毫米波雷达中的应用[J]. 应用科学学报, 2002, 20(4): 419-422.
[9]	童创明, 洪伟, 周后型. Hermite展开应用于多导线散射的时、频域联合外推[J]. 应用科学学报, 2002, 20(1): 51-54.
[10]	汪杰, 洪伟. 有限长介质柱体电磁散射的区域分裂算法[J]. 应用科学学报, 2001, 19(2): 135-139.
[11]	尹雷, 洪伟. 区域分裂法:一种精确高效的三维微波结构全波分析方法[J]. 应用科学学报, 2000, 18(3): 237-241.
[12]	徐科军, 李成. 传感器非线性传递函数的频域估计[J]. 应用科学学报, 1999, 17(4): 457-462.
[13]	曹秀英. 异步置乱体制中置乱矩阵的研究[J]. 应用科学学报, 1998, 16(4): 416-420.