计算机应用专辑

基于联邦集成算法对不同脱敏数据的研究

展开
  • 1. 宁夏大学数学统计学院, 宁夏 银川 750021;
    2. 华北理工大学理学院, 河北 唐山 063210;
    3. 华北理工大学河北省数据科学与应用重点实验室, 河北 唐山 063210

收稿日期: 2023-09-22

  网络出版日期: 2024-02-02

基金资助

国家自然科学基金(No. U20A20179);唐山市科技项目(No. 18120203A)资助

Research on Different Desensitization Data Based on Federated Ensemble Algorithm

Expand
  • 1. School of Mathematics and Statistics, Ningxia University, Yinchuan 750021, Ningxia, China;
    2. College of Science, North China University of Science and Technology, Tangshan 063210, Hebei, China;
    3. Hebei Province Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan 063210, Hebei, China

Received date: 2023-09-22

  Online published: 2024-02-02

摘要

针对联邦学习中存在梯度更新导致本地数据可能泄露的问题,提出基于本地脱敏数据上的联邦集成算法。该算法用变异率与适应度阈值的不同取值对原始数据进行脱敏,且使用不同类型的模型在经不同程度脱敏的数据上进行本地模型训练,以确定适合的联邦集成算法参数。实验结果表明,与联邦平均算法和传统集中式训练相比,stacking联邦集成算法与voting联邦集成算法的准确率要优于基线准确率。在实际应用中,可根据不同的需求设置不同的脱敏参数来保护数据,以此提升数据的安全性。

本文引用格式

罗长银, 陈学斌, 张淑芬, 尹志强, 石义, 李风军 . 基于联邦集成算法对不同脱敏数据的研究[J]. 应用科学学报, 2024 , 42(1) : 94 -102 . DOI: 10.3969/j.issn.0255-8297.2024.01.008

Abstract

To solve the problem that gradient updating leads to the possible leakage of local data in federated learning, federated ensemble algorithms based on local desensitization data are proposed. The algorithm desensitizes the raw data with different values of variability and fitness thresholds, employing diverse models for local training on data with different desensitization levels to ascertain parameters suitable for a federated ensemble approach. Experimental results show that the stacking federated ensemble algorithm and voting federated integration algorithm outperform the baseline accuracy achieved by the federated average algorithm with traditional centralized training. In practical applications, different desensitization parameters can be set according to different needs to protect data and improve its security.

参考文献

[1] Mcmahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [DB/OL]. 2016[2023-09-22]. https://arxiv.org/abs/1602.05629.
[2] Konecný J, Mcmahan H B, Yu F X, et al. Federated learning: strategies for improving communication efficiency [DB/OL].2016[2023-09-22]. http://arxiv.org/abs/1610.05492..
[3] Yang Q. Challenges of GDPR to AI and countermeasures based on federated transfer learning [J]. Communications of Chinese Association of Artificial Intelligence, 2018, 8: 1-8.
[4] Yang Q, Liu Y, Chen T J, et al. Federated machine learning: concept and applications [J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 1-19.
[5] Wang S Q, Tuor T, Salonidis T, et al. Adaptive federated learning in resource constrained edge computing systems [J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1205-1221.
[6] Liu Y, Ma Z, Yang Y L, et al. RevFRF: enabling cross-domain random forest training with revocable federated learning [J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 3671-3685.
[7] Sharma S, Chen K. Privacy-preserving boosting with random linear classifiers [C]//ACM SIGSAC Conference on Computer and Communications Security, 2018: 2294-2296.
[8] Sun, C, Shrivastava A, Singh S, et al. Revisiting unreasonable effectiveness of data in deep learning era [C]//IEEE International Conference on Computer Vision (ICCV), 2017: 843-852.
[9] Cao M R, Zhang L, Cao B. Toward on-device federated learning: a direct acyclic graph-based blockchain approach [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2028-2042.
[10] Li S Y, Cheng Y, Liu Y, et al. Abnormal client behavior detection in federated learning [DB/OL]. 2019[2023-09-22]. https://arxiv.org/pdf/1910.09933.
[11] Zhu L G, Liu Z J, Han S. Deep leakage from gradients [DB/OL]. 2019[2023-09-22]. https://arxiv.org/pdf/1906.08935.
[12] 陈玉昇, 杨燕华, 林萌, 等. 基于深度学习神经网络的核电厂故障诊断技术[J]. 上海交通大学学报, 2018, 52: 58-61. Chen Y S, Yang Y H, Lin M, et al. Nuclear power plant fault diagnosis technology based on deep learning neural network [J]. Journal of Shanghai Jiaotong University, 2018, 52: 58-61. (in Chinese)
[13] Yang K, Jiang T, Shi Y M, et al. Federated learning via over-the-air computation [J]. IEEE Transactions on Wireless Communications, 2020, 19(3): 2022-2035.
[14] 曹晓夏, 缪淮扣, 高晓雷. 一种将遗传算法应用于谓词求精的方法[J]. 应用科学学报, 2003, 21(3): 289-295. Cao X X, Miao H K, Gao X L. An approach to applying genetic algorithm to predicate refinement [J]. Journal of Applied Sciences, 2003, 21(3): 289-295.
[15] 邓灏, 唐希浪, 蔡忠义, 等. 基于改进遗传算法的多无人机搜索航路规划[DB/OL]. 2023[2023-09-22]. https://kns.cnki.net/kcms/detail/41.1227.TN.20230907.1128.002.html. Deng H, Tang X L, Cai Z Y, et al. Multi-UAV search route planning based on improved genetic algorithm [DB/OL]. 2023[2023-09-22]. https://kns.cnki.net/kcms/detail/41.1227.TN.20230907.1128.002.html. (in Chinese)
[16] Brisimi T S, Chen R D, Mela T, et al. Federated learning of predictive models from federated electronic health records [J]. International Journal of Medical Informatics, 2018, 112: 59-67.
[17] Zhang W S, Zhang Y J, Zhai J, et al. Multi-source data fusion using deep learning for smart refrigerators [J]. Computers in Industry, 2018, 95: 15-21.
[18] Lee J, Sun J M, Wang F, et al. Privacy-preserving patient similarity learning in a federated environment: development and analysis [J]. JMIR Medical Informatics, 2018, 6(2): e20.
[19] 李澜, 杨华, 马维青. 一种基于遗传算法的数据脱敏方法[C]//2019电力行业信息化年会论文集, 2019: 4. Li L, Yang H, Ma W Q. A data desensitization method based on genetic algorithm [C]//2019 Electric Power Industry Informatization Annual Conference, 2019: 4. (in Chinese)
[20] 罗长银, 陈学斌, 刘洋, 等. 基于联邦集成算法对多源数据安全性的研究[J]. 计算机工程与科学, 2021, 43(8): 1387-1397. Luo C Y, Chen X B, Liu Y, et al. A federated ensemble algorithm for multi-source data security [J]. Computer Engineering & Science, 2021, 43(8): 1387-1397. (in Chinese)
文章导航

/