保护隐私的分布式朴素贝叶斯挖掘

叶云, 石聪聪, 余勇, 怀梦迪, 林为民, 高鹏

doi:10.3969/j.issn.0255-8297.2017.01.001

应用科学学报 >

2017 , Vol. 35 >Issue 1: 1 - 10

DOI: https://doi.org/10.3969/j.issn.0255-8297.2017.01.001

智能电网

保护隐私的分布式朴素贝叶斯挖掘

展开

1. 国家电网公司全球能源互联网研究院, 南京 210003;
2. 中国科学技术大学计算机科学与技术学院, 合肥 230026;
3. 中国科学技术大学苏州研究院, 江苏苏州 215123

叶云,博士,高工,研究方向:信息安全、大数据、信息物理系统、量子安全,E-mail:yeyun@geiri.sgcc.com.cn

收稿日期: 2015-10-30

修回日期: 2016-07-06

网络出版日期: 2017-01-30

基金资助

国家电网公司科技项目基金（No.71-14-006，No.71-14-004）；国家电网公司千人计划专项基金（No.tx71-13-047）资助

收起

Privacy-Preserving Distributed Naive Bayes Data Mining

Expand

1. Global Energy Interconnection Research Institute, Smart Grid, Nanjing 210003, China;
2. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;
3. Suzhou Institute for Advanced Study, University of Science and Technology of China, Suzhou 215123, Jiangsu Province, China

Received date: 2015-10-30

Revised date: 2016-07-06

Online published: 2017-01-30

Fold

摘要

目前，分布式隐私保护朴素贝叶斯挖掘算法仅考虑分布式参与方的局部数据隐私而忽略全局的数据隐私，故难以有效抵抗合谋攻击.为此，基于差分隐私、秘密共享、安全多方计算等技术，提出一种分布式隐私保护朴素贝叶斯新算法.该算法采用安全求和协议构建保护隐私的朴素贝叶斯协议，对参与方的局部数据进行隐私保护.利用差分隐私保护机制对全局学习得到的朴素贝叶斯分类模型进行隐私保护.针对可能存在的合谋攻击，基于秘密共享设计了随机选择协议，将添加Laplace噪声的参与者随机化，有效防御安全多方计算中的相邻节点合谋及多数节点合谋攻击，并在此基础上优化保护隐私的朴素贝叶斯挖掘算法.实验表明，该隐私保护算法具有良好的分类性能和扩展性.

关键词： 分布式朴素贝叶斯; 差分隐私; 隐私保护; 安全多方计算

本文引用格式

叶云, 石聪聪, 余勇, 怀梦迪, 林为民, 高鹏 . 保护隐私的分布式朴素贝叶斯挖掘[J]. 应用科学学报, 2017 , 35(1) : 1 -10 . DOI: 10.3969/j.issn.0255-8297.2017.01.001

Abstract

Current works involving distributed privacy-preserving Naive Bayes data mining only consider the privacy of each party but ignore the fact that the learned Naive Bayes classifer can also potentially disclose the global privacy.Additionally, these works cannot deal with collusion attacks.Based on secure multi-party computation and differential privacy, we propose a horizontally distributed privacy-preserving Naive Bayes protocol.In this protocol, we construct the privacy-preserving Naive Bayes protocol based on the secure multiparty computation theories to protect each party's privacy.We then make the learned Naive Bayes achieve differential privacy to prevent the global privacy from the learned classifer.To resolve two kinds of collusion attacks, we construct a random selection algorithm based on the secret sharing theories.To achieve this, we randomize the Laplace noise provider.In this way, collusions among massive parties and adjacent parties are prevented.Using these steps, we construct a privacy-preserving Naive Bayes algorithm.Experimental results reveal that the proposed distributed protocol has good classifcation performance regardless of the number of participating parties.In other words, it has high scalability.

Key words： privacy preserving; distributed Naive Bayes; differential privacy; secure multi-party computation

参考文献

[1] Agrawal R, Srikant R. Privacy-preserving data mining[J]. ACM Sigmod Record, 2009, 29(2):439-450.
[2] Vaidya, J, Basit S, Anirban B, Yuan H. Differentially private Naive Bayes classifcation[C]//2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013, 1:571-576.
[3] Peng Z, Tong Y H, Tang S W, Yang D Q. Privacy preserving Naive Bayes classifcation[M]//Advanced Data Mining and Applications. Berlin Heidelberg:Springer, 2005:744-752.
[4] Dwork C. Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, part Ⅱ (ICALP2006), 2006:1-12.
[5] Kantarc?oglu M, Vaidya J, Clifton C. Privacy preserving Naive Bayes classifer for horizontally partitioned data[C]//IEEE ICDM Workshop on Privacy Preserving Data Mining, 2003:3-9.
[6] Yi X, Zhang Y. Privacy-preserving Naive Bayes classifcation on distributed data via semitrusted mixers[J]. Information systems, 2009, 34(3):371-380.
[7] Vaidya J, Kantarc?olu M, Clifton C. Privacy-preserving Naive Bayes classifcation[J]. The VLDB Journal-The International Journal on Very Large Data Bases, 2008, 17(4):879-898.
[8] Malik M B, Ghazi M A, Ali R. Privacy preserving data mining techniques:current scenario and future prospects[C]//2012 Third International Conference on Computer and Communication Technology (ICCCT), IEEE, 2012:26-32.
[9] Sathiyapriya K, Sadasivam G S. A survey on privacy preserving association rule mining[J]. International Journal of Data Mining & Knowledge Management Process, 2013, 3(2):119-131
[10] Dwork C, Mcsherry F, Nissim K. Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Int Confon Theory of Cryptography. Berlin:Springer, 2006:265- 284.
[11] Yao A C. Protocols for computations[C]//Proceedings of the 23rd Annual Symposium on Foundations of Computer Science. 1982.
[12] Goldwasser S. Multi party computation:past and present[C]//Proceedings of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing, 1997:1-6.
[13] Beaver D, Goldwasser S. Multiparty computation with faulty majority[C]//Proceedings of the Advances in Cryptology-CRYPTO'89. 1990:589-590.
[14] Lindell Y, Pinks B. Privacy preserving data mining[C]//Advances in Cryptology-CRYPTO 2000. Berlin Heidelberg Springer, 2000:36-54.
[15] Kiltz E. Unconditionally secure constant round multi-party computation for equality, comparison, bits and exponentiation[J]. IACR Cryptology ePrint Archive, 2005, 66:285-304.
[16] Emekçi F, Sahin O D, Agrawal D, Abbadi A E. Privacy preserving decision tree learning over multiple parties[J]. Data & Knowledge Engineering, 2007, 63(2):348-361.
[17] Blake C, Merz C. UCI repository of machine learning databases[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA:University of California. Department of Information and Computer Science, 1998:55.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献