应用科学学报 ›› 2024, Vol. 42 ›› Issue (6): 1052-1063.doi: 10.3969/j.issn.0255-8297.2024.06.013

• 计算机科学与应用 • 上一篇    下一篇

基于差分隐私的非等距直方图发布算法

单丽洋1,2,3, 陈学斌1,2,3, 郭如敏1,2,3   

  1. 1. 华北理工大学 理学院, 河北 唐山 063210;
    2. 华北理工大学 河北省数据科学与应用重点实验室, 河北 唐山 063210;
    3. 华北理工大学 唐山市数据科学重点实验室, 河北 唐山 063210
  • 收稿日期:2024-03-21 出版日期:2024-11-30 发布日期:2024-11-30
  • 通信作者: 陈学斌,教授,研究方向为大数据安全、物联网安全、网络安全。E-mail:chxb@ncst.edu.cn E-mail:chxb@ncst.edu.cn
  • 基金资助:
    国家自然科学基金(No.U20A20179)资助

Non-isometric Histogram Publishing Algorithm Based on Differential Privacy

SHAN Liyang1,2,3, CHEN Xuebin1,2,3, GUO Rumin1,2,3   

  1. 1. College of Science, North China University of Science and Technology, Tangshan 063210, Hebei, China;
    2. Hebei Provincial Key Laboratory of Data Science and Application, North China;University of Science and Technology, Tangshan 063210, Hebei, China;
    3. Tangshan Key Laboratory of Data Science, North China University of Science and Technology, Tangshan 063210, Hebei, China
  • Received:2024-03-21 Online:2024-11-30 Published:2024-11-30

摘要: 针对直方图隐私泄露与分组数难以确定的问题,提出一种基于差分隐私的非等距直方图数据发布算法。首先,提出一种改进的定量化的综合评价指标,将直方图的分组评判标准定量化为特定的计算公式,以确定直方图最优分组数。然后,利用经验分布函数设计隐私预算分配方案,计算得出分组边界,从而构建非等距直方图。最后,根据非等距边界划分的分组,统计组内频数,对频数进行加噪,发布满足差分隐私的非等距直方图。实验结果表明,分组数的最优计算及非等距的实现,保证了直方图发布数据的准确性和隐私性,同时仍能保证直方图的分布特征不受影响,该文所提发布算法的均方误差与同类精确的直方图发布(accurate histogram publication,AHP)算法相比降低了99%。

关键词: 非等距, 直方图分组, 差分隐私, 隐私预算

Abstract: To address the histogram privacy leakage and the challenge of determining the number of groups, a non-equidistant histogram data publishing algorithm based on differential privacy (DP) is proposed. Firstly, an improved quantified comprehensive evaluation index is introduced, which quantifies the criterion of histogram grouping into a specific calculation formula to determine the optimal number of histogram groups. Next, the empirical distribution function is used to design a privacy budget allocation scheme, and the grouping boundaries are calculated to construct the non-equidistant histogram. The dataset is then divided according to the non-equidistant boundaries, and the frequencies are counted, with noise added to satisfy the differential privacy requirements. The non-equidistant histogram is subsequently published. Experimental results show that the optimal calculation of the number of groups and the implementation of non-equidistance can ensure the accuracy and privacy of the published data of the histogram, while preserving the distribution characteristics of the histogram. The mean square error of the proposed algorithm is reduced by 99% compared with similar accurate histogram publication (AHP) algorithms.

Key words: non-isometric, histogram grouping, differential privacy (DP), privacy budget

中图分类号: