基于联邦集成算法对不同脱敏数据的研究

doi:10.3969/j.issn.0255-8297.2024.01.008

应用科学学报 ›› 2024, Vol. 42 ›› Issue (1): 94-102.doi: 10.3969/j.issn.0255-8297.2024.01.008

基于联邦集成算法对不同脱敏数据的研究

罗长银^1,2,3, 陈学斌^2,3, 张淑芬^2,3, 尹志强², 石义², 李风军¹

1. 宁夏大学数学统计学院, 宁夏银川 750021;
2. 华北理工大学理学院, 河北唐山 063210;
3. 华北理工大学河北省数据科学与应用重点实验室, 河北唐山 063210

收稿日期:2023-09-22 出版日期:2024-01-30 发布日期:2024-02-02
通信作者: 陈学斌,教授,研究方向为数据安全、物联网安全、网络安全。E-mail:chxb@qq.com E-mail:chxb@qq.com
基金资助:
国家自然科学基金（No. U20A20179）；唐山市科技项目（No. 18120203A）资助

Research on Different Desensitization Data Based on Federated Ensemble Algorithm

LUO Changyin^1,2,3, CHEN Xuebin^2,3, ZHANG Shufen^2,3, YIN Zhiqiang², SHI Yi², LI Fengjun¹

1. School of Mathematics and Statistics, Ningxia University, Yinchuan 750021, Ningxia, China;
2. College of Science, North China University of Science and Technology, Tangshan 063210, Hebei, China;
3. Hebei Province Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan 063210, Hebei, China

Received:2023-09-22 Online:2024-01-30 Published:2024-02-02

摘要/Abstract

摘要： 针对联邦学习中存在梯度更新导致本地数据可能泄露的问题，提出基于本地脱敏数据上的联邦集成算法。该算法用变异率与适应度阈值的不同取值对原始数据进行脱敏，且使用不同类型的模型在经不同程度脱敏的数据上进行本地模型训练，以确定适合的联邦集成算法参数。实验结果表明，与联邦平均算法和传统集中式训练相比，stacking联邦集成算法与voting联邦集成算法的准确率要优于基线准确率。在实际应用中，可根据不同的需求设置不同的脱敏参数来保护数据，以此提升数据的安全性。

关键词: 联邦学习, 梯度更新, 联邦集成算法, 集成算法

Abstract: To solve the problem that gradient updating leads to the possible leakage of local data in federated learning, federated ensemble algorithms based on local desensitization data are proposed. The algorithm desensitizes the raw data with different values of variability and fitness thresholds, employing diverse models for local training on data with different desensitization levels to ascertain parameters suitable for a federated ensemble approach. Experimental results show that the stacking federated ensemble algorithm and voting federated integration algorithm outperform the baseline accuracy achieved by the federated average algorithm with traditional centralized training. In practical applications, different desensitization parameters can be set according to different needs to protect data and improve its security.

Key words: federated learning, gradient update, federated ensemble algorithm, ensemble algorithm

中图分类号:

TP391

罗长银, 陈学斌, 张淑芬, 尹志强, 石义, 李风军. 基于联邦集成算法对不同脱敏数据的研究[J]. 应用科学学报, 2024, 42(1): 94-102.

LUO Changyin, CHEN Xuebin, ZHANG Shufen, YIN Zhiqiang, SHI Yi, LI Fengjun. Research on Different Desensitization Data Based on Federated Ensemble Algorithm[J]. Journal of Applied Sciences, 2024, 42(1): 94-102.

参考文献

[1] Mcmahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [DB/OL]. 2016[2023-09-22]. https://arxiv.org/abs/1602.05629.
[2] Konecný J, Mcmahan H B, Yu F X, et al. Federated learning: strategies for improving communication efficiency [DB/OL].2016[2023-09-22]. http://arxiv.org/abs/1610.05492..
[3] Yang Q. Challenges of GDPR to AI and countermeasures based on federated transfer learning [J]. Communications of Chinese Association of Artificial Intelligence, 2018, 8: 1-8.
[4] Yang Q, Liu Y, Chen T J, et al. Federated machine learning: concept and applications [J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 1-19.
[5] Wang S Q, Tuor T, Salonidis T, et al. Adaptive federated learning in resource constrained edge computing systems [J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1205-1221.
[6] Liu Y, Ma Z, Yang Y L, et al. RevFRF: enabling cross-domain random forest training with revocable federated learning [J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 3671-3685.
[7] Sharma S, Chen K. Privacy-preserving boosting with random linear classifiers [C]//ACM SIGSAC Conference on Computer and Communications Security, 2018: 2294-2296.
[8] Sun, C, Shrivastava A, Singh S, et al. Revisiting unreasonable effectiveness of data in deep learning era [C]//IEEE International Conference on Computer Vision (ICCV), 2017: 843-852.
[9] Cao M R, Zhang L, Cao B. Toward on-device federated learning: a direct acyclic graph-based blockchain approach [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2028-2042.
[10] Li S Y, Cheng Y, Liu Y, et al. Abnormal client behavior detection in federated learning [DB/OL]. 2019[2023-09-22]. https://arxiv.org/pdf/1910.09933.
[11] Zhu L G, Liu Z J, Han S. Deep leakage from gradients [DB/OL]. 2019[2023-09-22]. https://arxiv.org/pdf/1906.08935.
[12] 陈玉昇, 杨燕华, 林萌, 等. 基于深度学习神经网络的核电厂故障诊断技术[J]. 上海交通大学学报, 2018, 52: 58-61. Chen Y S, Yang Y H, Lin M, et al. Nuclear power plant fault diagnosis technology based on deep learning neural network [J]. Journal of Shanghai Jiaotong University, 2018, 52: 58-61. (in Chinese)
[13] Yang K, Jiang T, Shi Y M, et al. Federated learning via over-the-air computation [J]. IEEE Transactions on Wireless Communications, 2020, 19(3): 2022-2035.
[14] 曹晓夏, 缪淮扣, 高晓雷. 一种将遗传算法应用于谓词求精的方法[J]. 应用科学学报, 2003, 21(3): 289-295. Cao X X, Miao H K, Gao X L. An approach to applying genetic algorithm to predicate refinement [J]. Journal of Applied Sciences, 2003, 21(3): 289-295.
[15] 邓灏, 唐希浪, 蔡忠义, 等. 基于改进遗传算法的多无人机搜索航路规划[DB/OL]. 2023[2023-09-22]. https://kns.cnki.net/kcms/detail/41.1227.TN.20230907.1128.002.html. Deng H, Tang X L, Cai Z Y, et al. Multi-UAV search route planning based on improved genetic algorithm [DB/OL]. 2023[2023-09-22]. https://kns.cnki.net/kcms/detail/41.1227.TN.20230907.1128.002.html. (in Chinese)
[16] Brisimi T S, Chen R D, Mela T, et al. Federated learning of predictive models from federated electronic health records [J]. International Journal of Medical Informatics, 2018, 112: 59-67.
[17] Zhang W S, Zhang Y J, Zhai J, et al. Multi-source data fusion using deep learning for smart refrigerators [J]. Computers in Industry, 2018, 95: 15-21.
[18] Lee J, Sun J M, Wang F, et al. Privacy-preserving patient similarity learning in a federated environment: development and analysis [J]. JMIR Medical Informatics, 2018, 6(2): e20.
[19] 李澜, 杨华, 马维青. 一种基于遗传算法的数据脱敏方法[C]//2019电力行业信息化年会论文集, 2019: 4. Li L, Yang H, Ma W Q. A data desensitization method based on genetic algorithm [C]//2019 Electric Power Industry Informatization Annual Conference, 2019: 4. (in Chinese)
[20] 罗长银, 陈学斌, 刘洋, 等. 基于联邦集成算法对多源数据安全性的研究[J]. 计算机工程与科学, 2021, 43(8): 1387-1397. Luo C Y, Chen X B, Liu Y, et al. A federated ensemble algorithm for multi-source data security [J]. Computer Engineering & Science, 2021, 43(8): 1387-1397. (in Chinese)

基于联邦集成算法对不同脱敏数据的研究

Research on Different Desensitization Data Based on Federated Ensemble Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价

[1]	史汶泽, 陆林, 秦文杰, 于涛. 一种可信执行环境下的联邦逻辑回归评分卡系统[J]. 应用科学学报, 2023, 41(3): 488-499.
[2]	罗长银, 陈学斌, 宋尚文, 张淑芬, 刘之瑜. 基于深度学习的联邦集成算法[J]. 应用科学学报, 2022, 40(3): 493-510.
[3]	雷凯, 黄硕康, 方俊杰, 黄济乐, 谢英英, 彭波. 智能生态网络:知识驱动的未来价值互联网基础设施[J]. 应用科学学报, 2020, 38(1): 152-172.