To avoid data redundancy and save computing resources, most of the existing Deepfake video detection methods select multiple frames or partial segments of videos as the detection objects. However, this selection strategy compromises the representation ability of the detection objects and limits the performance. Moreover, while the existing algorithms perform well on individual datasets, their performance degrade seriously when detecting across datasets, highlighting the need for improved generalization. To address these challenges, we propose a frequency domain multi-feature fusion algorithm for Deepfake video detection based on key frames. The mean square error in frequency domain is used to extract the key frames as the detection objects. Then the artifact features of the main frame and temporal inconsistency features between the key frames are learned in frequency domain. These features are fused and passed through a fully connected layer to obtain the final detection results. Experimental results show that our algorithm achieves superior performance in cross-dataset detection compared to existing methods, showcasing strong generalization capabilities.
[1] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]//2017 IEEE International Conference on Computer Vision, 2017: 2242- 2251.
[2] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks [DB/OL]. (2014-06-10) [2022-04-11]. https://arxiv.org/abs/1406.2661v1.
[3] Ji Z, Yan J T, Wang Q, et al. Triple discriminator generative adversarial network for zero-shot image classification [J]. Science China Information Sciences, 2021, 64(2): 120101.
[4] Zhao H Q, Wei T Y, Zhou W B, et al. Multi-attentional deepfake detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2185-2194.
[5] Zhou P, Han X T, Morariu V I, et al. Two-stream neural networks for tampered face detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017: 1831-1839.
[6] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions [C]//2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
[7] Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering [C]//2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 815-823.
[8] Burges C. A tutorial on support vector machines for pattern recognition [J]. Data Mining and Knowledge Discovery, 1998, 2: 121-167.
[9] Afchar D, Nozick V, Yamagishi J, et al. MesoNet: a compact facial video forgery detection network [C]//2018 IEEE International Workshop on Information Forensics and Security, 2018: 1-7.
[10] Li Y Z, Lyu S W. Exposing DeepFake videos by detecting face warping artifacts [DB/OL]. (2018-11-01) [2022-04-11]. https://arxiv.org/abs/1811.00656v3.
[11] Nguyen H H, Yamagishi J, Echizen I. Capsule-forensics: using capsule networks to detect forged images and videos [C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing, 2019: 2307-2311.
[12] Li Y Z, Chang M C, Lyu S W. In ictu oculi: exposing AI created fake videos by detecting eye blinking [C]//201810th IEEE International Workshop on Information Forensics and Security, 2018: 1-7.
[13] Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural Computation, 1997, 9: 1735-1780.
[14] Güera D, Delp E J. Deepfake video detection using recurrent neural networks [C]//201815th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2018: 127-132.
[15] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 2818-2826.
[16] Sabir E, Cheng J X, Jaiswal A, et al. Recurrent convolutional strategies for face manipulation detection in videos [DB/OL]. (2019-05-16) [2022-04-11]. https://arxiv.org/abs/1905.00582.
[17] Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of regression trees [C]//2014 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2014: 1867- 1874.
[18] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks [C]//30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 2261- 2269.
[19] Schuster M, Paliwal K K. Bidirectional recurrent neural networks [J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
[20] Durall R, Keuper M, Pfreundt F J, et al. Unmasking DeepFakes with simple features [DB/OL]. (2020-03-04) [2022-04-11]. https://arxiv.org/abs/1911.00686v3.
[21] Chen S, Yao T P, Chen Y, et al. Local relation learning for face forgery detection [C]//35th AAAI Conference on Artificial Intelligence/33rd Conference on Innovative Applications of Artificial Intelligence/11th Symposium on Educational Advances in Artificial Intelligence, 2021: 1081-1088.
[22] Qian Y Y, Yin G J, Sheng L, et al. Thinking in frequency: face forgery detection by mining frequency aware clues [C]//European Conference on Computer Vision, 2020: 86-103.
[23] Liu H G, Li X D, Zhou W B, et al. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 772-781.
[24] Jang E, Gu S X, Poole B. Categorical reparameterization with Gumbel-Softmax [DB/OL]. (2016-11-03) [2022-04-11]. https://arxiv.org/abs/1611.01144v5.
[25] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[26] Dey R, Salem F M. Gate-variants of gated recurrent unit (GRU) neural networks [C]//2017 IEEE 60th International Midwest Symposium on Circuits and Systems, 2017: 1597-1600.
[27] R?ssler A, Cozzolino D, Verdoliva L, et al. FaceForensics plus plus: learning to detect manipulated facial images [C]//2019 IEEE International Conference on Computer Vision, 2019: 1-11.
[28] Li Y Z, Yang X, Sun P, et al. Celeb-DF: a large-scale challenging dataset for DeepFake forensics [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3204-3213.
[29] Korshunov P, Marcel S. DeepFakes: a new threat to face recognition? assessment and detection [DB/OL]. (2018-12-20) [2022-04-11]. https://arxiv.org/abs/1812.08685v1.
[30] Thies J, Zollh?fer M, Stamminger M, et al. Face2Face: real-time face capture and reenactment of RGB videos [C]//2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 2387-2395.
[31] Liu Z W, Luo P, Wang X G, et al. Deep learning face attributes in the wild [C]//2015 IEEE International Conference on Computer Vision, 2015: 3730-3738.
[32] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules [C]//31st Annual Conference on Neural Information Processing Systems, 2017: 3859-3869.
[33] Chollet F. Xception: deep learning with depthwise separable convolutions [C]//2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1800-1807.
[34] Wu X, Xie Z, Gao Y T, et al. SSTNet: detecting manipulated faces through spatial, steganalysis and temporal features [C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing, 2020: 2952-2956.
[35] Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of StyleGAN [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8107-8116.
[36] Nguyen H H, Fang F M, Yamagishi J, et al. Multi-task learning for detecting and segmenting manipulated facial images and videos [DB/OL]. (2019-06-17) [2022-04-11]. https://arxiv.org/abs/1906.06876v1.
[37] Masi I, Killekar A, Mascarenhas R M, et al. Two-branch recurrent network for isolating deepfakes in videos [C]//European Conference on Computer Vision, 2020: 667-684.