Computer Science and Applications

Non-isometric Histogram Publishing Algorithm Based on Differential Privacy

Expand
  • 1. College of Science, North China University of Science and Technology, Tangshan 063210, Hebei, China;
    2. Hebei Provincial Key Laboratory of Data Science and Application, North China;University of Science and Technology, Tangshan 063210, Hebei, China;
    3. Tangshan Key Laboratory of Data Science, North China University of Science and Technology, Tangshan 063210, Hebei, China

Received date: 2024-03-21

  Online published: 2024-11-30

Abstract

To address the histogram privacy leakage and the challenge of determining the number of groups, a non-equidistant histogram data publishing algorithm based on differential privacy (DP) is proposed. Firstly, an improved quantified comprehensive evaluation index is introduced, which quantifies the criterion of histogram grouping into a specific calculation formula to determine the optimal number of histogram groups. Next, the empirical distribution function is used to design a privacy budget allocation scheme, and the grouping boundaries are calculated to construct the non-equidistant histogram. The dataset is then divided according to the non-equidistant boundaries, and the frequencies are counted, with noise added to satisfy the differential privacy requirements. The non-equidistant histogram is subsequently published. Experimental results show that the optimal calculation of the number of groups and the implementation of non-equidistance can ensure the accuracy and privacy of the published data of the histogram, while preserving the distribution characteristics of the histogram. The mean square error of the proposed algorithm is reduced by 99% compared with similar accurate histogram publication (AHP) algorithms.

Cite this article

SHAN Liyang, CHEN Xuebin, GUO Rumin . Non-isometric Histogram Publishing Algorithm Based on Differential Privacy[J]. Journal of Applied Sciences, 2024 , 42(6) : 1052 -1063 . DOI: 10.3969/j.issn.0255-8297.2024.06.013

References

[1] Chen S, Fu A, Yu S, et al. DP-QIC: a differential privacy scheme based on quasi-identifier classification for big data publication [J]. Soft Computing, 2021, 25: 7325-7339.
[2] Zheng X, Yan K, Duan J, et al. Histogram publication over numerical values under local differential privacy [J]. Wireless Communications and Mobile Computing, 2021, 4: 1-11.
[3] Joshuva A, Sugumaran V. Comparative study on tree classifiers for application to condition monitoring of wind turbine blade through histogram features using vibration signals: a datamining approach [J]. Structural Durability & Health Monitoring, 2019, 13(4): 399-416.
[4] Sturges H A. The choice of a class interval [J]. Journal of the American Statistical Association, 1926, 21(153): 65-66.
[5] Scott D W. On optimal and data-based histograms [J]. Biometrika, 1979(3): 605-610.
[6] Rudemo M. Empirical choice of histogram and kernel density estimators [J]. Scandinavian Journal of Statistics, 1982(9): 65-78.
[7] Wang X X, Zhang J F. Histogram-kernel error and its application for bin width selection in histograms [J]. Acta Mathematicae Applicatae Sinica, English Series, 2012, 28(3): 607-624.
[8] 张建方, 王秀祥. 直方图理论与最优直方图制作[J]. 应用概率统计, 2009, 25(2): 201-214. Zhang J F, Wang X X. Histogram theories and optimal histogram construction algorithms [J]. Chinese Journal of Applied Probability and Statistics, 2009, 25(2): 201-214. (in Chinese)
[9] Dwork C, Smith A. Differential privacy for statistics: what we know and what we want to learn [J]. Journal of Privacy and Confidentiality, 2009, 1(2): 135-154.
[10] Zhang X, Chen R, Xu J, et al. Towards accurate histogram publication under differential privacy [C]//Proceedings of the 2014 SIAM International Conference on Data Mining, 2014: 587-595.
[11] 陈婷, 罗景青. 基于直方图最优分组的雷达信号特征参数分析[J]. 微电子学与计算机, 2009, 26(3): 162-165. Chen T, Luo J Q. Analysis of radar signal feature parameters based on optimal grouping of histogram [J]. Microelectronics and Computer, 2009, 26(3): 162-165. (in Chinese)
[12] 杨磊, 郑啸, 赵伟. 基于差分隐私的非等距直方图发布方法[J]. 网络与信息安全学报, 2020, 6(3): 39-49. Yang L, Zheng X, Zhao W. Non-isometric histogram publishing method based on differential privacy [J]. Journal of Network and Information Security, 2020, 6(3): 39-49. (in Chinese)
[13] 徐文涛, 李林森, 钮佳超, 等. 一种基于桶重构的差分隐私直方图发布方法[J]. 通信技术, 2019, 52(2): 409-417. Xu W T, Li L S, Niu J C, et al. A differential privacy histogram publishing method based on barrel reconstruction [J]. Communications Technology, 2019, 52(2): 409-417. (in Chinese)
[14] 张浩铭, 刘田天, 龙士工. 优化结构下的差分隐私直方图发布[J]. 计算机仿真, 2019, 36(3): 220-224. Zhang H M, Liu T T, Long S G. Differential privacy histogram release under the optimized structure [J]. Computer Simulation, 2019, 36(3): 220-224. (in Chinese)
[15] Lemzin A. Streaming data processing [J]. Asian Journal of Research in Computer Science, 2023, 15(1): 11-21.
[16] 王修君, 莫磊, 郑啸, 等. 面向数据流滑动窗口的自适应直方图发布算法[J]. 计算机科学, 2022, 49(10): 344-352. Wang X J, Mo L, Zheng X, et al. Adaptive histogram publishing algorithm for data flow sliding window [J]. Computer Science, 2022, 49(10): 344-352. (in Chinese)
[17] Gao R, Ma X. Dynamic data histogram publishing based on differential privacy [C]//IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 2018: 737-743.
[18] Li Y L, Li S Y. Research on differential private streaming histogram publication algorithm [C]//Proceedings of the 20185th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), 2018: 598-603.
[19] Chen Q, Ni Z, Zhu X, et al. Differential privacy histogram publishing method based on dynamic sliding window [J]. Frontiers of Computer Science, 2022, 17(4): 174809.
[20] Dwork C. Differential privacy [C]//Proceedings of the 2006 International Colloquium on Automata, Languages, and Programming, 2006: 1-12.
[21] 莫磊, 王修君, 郑啸, 等. 有效的基于滑动窗口数据流直方图方法[J]. 计算机应用研究, 2021, 38(7): 2085-2090. Mo L, Wang X J, Zheng X, et al. A valid sliding window-based data flow histogram method [J]. Application Research of Computers, 2021, 38(7): 2085-2090. (in Chinese)
[22] 李昆明, 王超迁, 倪巍伟, 等. 基于差分隐私的高精度直方图发布方法[J]. 计算机应用, 2020, 40(11): 3242-3248. Li K M, Wang C Q, Ni W W, et al. High-precision histogram publishing method based on differential privacy [J]. Journal of Computer Applications, 2020, 40(11): 3242-3248. (in Chinese)
[23] 唐海霞, 杨庚, 白云璐. 自适应差分隐私预算分配策略的直方图发布算法[J]. 计算机应用研究, 2020, 37(7): 1952-1957, 1963. Tang H X, Yang G, Bai Y L. A histogram publishing algorithm for the adaptive differential privacy budget allocation strategy [J]. Application Research of Computers, 2020, 37(7): 1952-1957, 1963. (in Chinese)
Outlines

/