Journal of Applied Sciences ›› 2024, Vol. 42 ›› Issue (6): 1052-1063.doi: 10.3969/j.issn.0255-8297.2024.06.013

• Computer Science and Applications • Previous Articles     Next Articles

Non-isometric Histogram Publishing Algorithm Based on Differential Privacy

SHAN Liyang1,2,3, CHEN Xuebin1,2,3, GUO Rumin1,2,3   

  1. 1. College of Science, North China University of Science and Technology, Tangshan 063210, Hebei, China;
    2. Hebei Provincial Key Laboratory of Data Science and Application, North China;University of Science and Technology, Tangshan 063210, Hebei, China;
    3. Tangshan Key Laboratory of Data Science, North China University of Science and Technology, Tangshan 063210, Hebei, China
  • Received:2024-03-21 Online:2024-11-30 Published:2024-11-30

Abstract: To address the histogram privacy leakage and the challenge of determining the number of groups, a non-equidistant histogram data publishing algorithm based on differential privacy (DP) is proposed. Firstly, an improved quantified comprehensive evaluation index is introduced, which quantifies the criterion of histogram grouping into a specific calculation formula to determine the optimal number of histogram groups. Next, the empirical distribution function is used to design a privacy budget allocation scheme, and the grouping boundaries are calculated to construct the non-equidistant histogram. The dataset is then divided according to the non-equidistant boundaries, and the frequencies are counted, with noise added to satisfy the differential privacy requirements. The non-equidistant histogram is subsequently published. Experimental results show that the optimal calculation of the number of groups and the implementation of non-equidistance can ensure the accuracy and privacy of the published data of the histogram, while preserving the distribution characteristics of the histogram. The mean square error of the proposed algorithm is reduced by 99% compared with similar accurate histogram publication (AHP) algorithms.

Key words: non-isometric, histogram grouping, differential privacy (DP), privacy budget

CLC Number: