基于局部和全局梯度上升的分段后门防御

doi:10.3969/j.issn.0255-8297.2023.02.003

摘要/Abstract

摘要： 针对后门触发器趋于隐蔽且难以检测的问题，提出了一种基于局部和全局梯度上升的分段后门防御方法：在训练前期，引入局部梯度上升扩大后门样本与干净样本平均训练损失之差，隔离出少量高精度后门样本，便于后期进行后门遗忘；在后门遗忘阶段，引入全局梯度上升，打破后门样本与目标类别之间的相关性，实现防御。实验基于3个基准数据集GTSRB、Cifar10和MNIST，在宽残差网络上针对6种先进后门攻击进行了大量实验，分段后门防御方法能够将绝大部分攻击的成功率防御至5%以下。另外，实验也证明了分段防御方法在后门数据集与干净数据集上都能训练出干净等效的学习模型。

关键词: 后门防御, 后门检测, 深度学习, 后门攻击, 信息安全

Abstract: Backdoor triggers tend to be hidden and are difficult to detect. To solve this problem, a segmented backdoor defense (SBD) method based on local and global gradient ascent is proposed. In the early stage of training, local gradient ascent is introduced to enlarge the difference between the average training loss of backdoor samples and clean samples. A small number of high-precision backdoor samples are isolated to facilitate backdoor forgetting in the later stage. In the backdoor forgetting stage, global gradient ascent is introduced to reduce the correlation between backdoor samples and target categories to achieve defense. Based on three benchmark datasets GTSRB, Cifar10 and MNIST, a large number of experiments are conducted on the WideResNet-16-1 model against six advanced backdoor attacks. It is shown that the proposed segmented backdoor defense method can reduce the success rate of most attacks to below 5%. Moreover, the proposed method can train a clean equivalent learning model on both backdoor dataset and clean dataset.

Key words: backdoor defense, backdoor detection, deep learning, backdoor attack, information security

中图分类号:

TP391.41

萧晓彤, 丁建伟, 张琪. 基于局部和全局梯度上升的分段后门防御[J]. 应用科学学报, 2023, 41(2): 218-227.

XIAO Xiaotong, DING Jianwei, ZHANG Qi. Segmented Backdoor Defense Based on Local Gradient and Global Gradient Ascent[J]. Journal of Applied Sciences, 2023, 41(2): 218-227.

参考文献

[1] Gu T Y, Liu K, Dolan-Gavitt B, et al. BadNets:evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7:47230-47244.
[2] Chen X Y, Liu C, Li B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[DB/OL]. 2017[2022-10-28]. https://arxiv.org/abs/1712.05526.
[3] Turner A, Tsipras D, Madry A. Clean-label backdoor attacks[EB/OL]. https://people.csail.mit.edu/madry/lab/,2019.
[4] Zhao S H, Ma X J, Zheng X, et al. Clean-label backdoor attacks on video recognition models[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2003.03030.
[5] Zhu C, Huang W R, Shafahi A, et al. Transferable clean-label poisoning attacks on deep neural nets[DB/OL]. 2019[2022-10-28]. https://arxiv.org/abs/1905.05897.
[6] Tran B, Li J, Madry A. Spectral signatures in backdoor attacks[DB/OL]. 2018[2022-10-28]. https://arxiv.org/abs/1811.00636.
[7] Liu Y, Ma S, Aafer Y, et al. Trojaning attack on neural networks[C]//Network and Distributed System Security Symposium, 2017.
[8] Nguyen A, Tran A. Input-aware dynamic backdoor attack[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2010.08138.
[9] Chen X, Ma Y N, Lu S W. Use procedural noise to achieve backdoor attack[J]. IEEE Access, 2021, 9:127204-127216.
[10] Nguyen A, Tran A. WaNet -imperceptible warping-based backdoor attack[DB/OL]. 2021[2022-10-28]. https://arxiv.org/abs/2102.10369.
[11] Liu Y F, Ma X J, Bailey J, et al. Reflection backdoor:a natural backdoor attack on deep neural networks[C]//16th European Conference on Computer Vision, 2020:182-199.
[12] Barni M, Kallas K, Tondi B. A new backdoor attack in CNNS by training set corruption without label poisoning[C]//2019 IEEE International Conference on Image Processing (ICIP), 2019:101-105.
[13] Li S F, Xue M H, Zhao B Z H, et al. Invisible backdoor attacks on deep neural networks via steganography and regularization[DB/OL]. 2019[2022-10-28]. https://arxiv.org/abs/1909.02742.
[14] Zhang J, Chen D D, Huang Q D, et al. Poison ink:robust and invisible backdoor attack[J]. IEEE Transactions on Image Processing, 2022, 31:5691-5705.
[15] Li Y G, Lyu X X, Koren N, et al. Anti-backdoor learning:training clean models on poisoned data[DB/OL]. 2021[2022-10-28]. https://arxiv.org/abs/2110.11571.
[16] Li Y M, Wu B Y, Jiang Y, et al. Backdoor learning:a survey[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2007.08745.
[17] Diakonikolas I, Kamath G, Kane D M, et al. Sever:a robust meta-algorithm for stochastic optimization:10.48550[P]. 2018-03-07.
[18] Gao C, Yao Y, Zhu W Z. Generative adversarial nets for robust scatter estimation:a proper scoring rule perspective[J]. Journal of Machine Learning Research, 2020:21(160):1-48.
[19] Koh P W, Liang P. Understanding black-box predictions via influence functions[DB/OL]. 2017[2022-10-28]. https://arxiv.org/abs/1703.04730.
[20] Ma S Q, Liu Y Q, Tao G H, et al. NIC:detecting adversarial samples with neural network invariant checking[C]//Network and Distributed System Security Symposium, 2019.
[21] Borgnia E, Cherepanova V, Fowl L, et al. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2011.09527.
[22] Rosenfeld E, Winston E, Ravikumar P, et al. Certified robustness to label-flipping attacks via randomized smoothing[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2002.03018.
[23] Wang B L, Yao Y S, Shan S, et al. Neural cleanse:identifying and mitigating backdoor attacks in neural networks[C]//IEEE Symposium on Security and Privacy, 2019.
[24] Qiao X M, Yang Y K, Li H. Defending neural backdoors via generative distribution modeling[C]//Neural Information Processing Systems, 2019.
[25] Liu K, Dolan-Gavitt B, Garg S. Fine-pruning:defending against backdooring attacks on deep neural networks[C]//International Symposium on Research in Attacks, 2018.
[26] Li Y M, Zhai T Q, Wu B Y, et al. Rethinking the trigger of backdoor attack[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2004.04692.
[27] Zhao P, Chen P Y, Das P, et al. Bridging mode connectivity in loss landscapes and adversarial robustness[DB/OL]. 2020[2022-10-28]. https://arxiv.org/abs/2005.00060.
[28] Li Y G, Koren N, Lyu L, et al. Neural attention distillation:erasing backdoor triggers from deep neural networks[DB/OL]. 2021[2022-10-28]. https://arxiv.org/abs/2101.05930.
[29] Wu D X, Wang Y S. Adversarial neuron pruning purifies backdoored deep models[DB/OL]. 2021[2022-10-28]. https://arxiv.org/abs/2110.14430.
[30] Li Y Z, Li Y M, Wu B Y, et al. Invisible backdoor attack with sample-specific triggers[C]//2021 IEEE International Conference on Computer Vision (ICCV), 2022:16443-16452.
[31] Gu T Y, Dolan-Gavitt B, Garg S. BadNets:identifying vulnerabilities in the machine learning model supply chain[DB/OL]. 2017[2022-10-28]. https://arxiv.org/abs/1708.06733.
[32] Stallkamp J, Schlipsing M, Salmen J, et al. Man vs. computer:benchmarking machine learning algorithms for traffic sign recognition[J]. Neural Networks, 2012, 32:323-332.
[33] Zagoruyko S, Komodakis N. Wide residual networks[DB/OL]. 2016[2022-10-28]. https://arxiv.org/abs/1605.07146.
[34] Krizhevsky A. Learning multiple layers of features from tiny images[J]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4):18268744.