Selected Papers Presented at 2016 Congress of Computer Applications, China

Data Stream Classifcation with Data Uncertainty and Concept Drift

Expand
  • 1. College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;
    2. School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Northeastern University, Qinhuangdao 066004, Hebei Province, China

Received date: 2016-10-05

  Revised date: 2017-02-27

  Online published: 2017-09-30

Abstract

Data in the Web have much uncertainty because of privacy protection, data loss, network errors, etc. In a data stream system, data arrive continuously and therefore one cannot obtain all data in any time. In addition, the concept drift often occurs in the data stream. This paper constructs an incremental classifcation model to deal with data stream classifcation with data uncertainty and concept drift. In this model, a fast decision tree algorithm is used. It can analyze uncertain information quickly and effectively both in the learning stage and the classifcation stage. In the learning stage, it uses the Hoeffding bound theory to quickly construct a decision tree model for the data stream with data uncertainty. In the classifcation stage, it uses a weighted Bayes classifer in the tree leaves to improve precision of the classifcation. The use of a sliding window to replace the tree ensures that the algorithm can deal with concept drift. Experimental results show that the algorithm has good classifcation accuracy and execution efciency both on artifcial and real data.

Cite this article

LÜ Yan-xia, WANG Cui-rong, WANG Cong, YUAN Ying . Data Stream Classifcation with Data Uncertainty and Concept Drift[J]. Journal of Applied Sciences, 2017 , 35(5) : 559 -569 . DOI: 10.3969/j.issn.0255-8297.2017.05.003

References

[1] Tsang S, Kao B, Yip K Y, Ho W S, Lee S D. Decision trees for uncertain data[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 231:64-78.
[2] Hulten G, Spencer L, Domingos P. Mining time-changing data streams[J]. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2001:220-224.
[3] Charu C A, Philip S Y. A survey of uncertain data algorithms and applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21:609-623.
[4] Ding X, Lian X, Chen L, Jin H. Continuous monitoring of skylines over uncertain data streams[J]. Information Sciences, 2012:196-214.
[5] Gao C, Wang J. Direct mining of discriminative patterns for classifying uncertain data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010:861-870.
[6] Cao K Y, Wang G, Han D. An algorithm for classifcation over uncertain data based on extreme learning machine[J]. Neurocomputing, 2016, 174:194-202.
[7] Qin B, Xia Y, Wang S, Du X. A novel Bayesian classifcation for uncertain data[J]. Knowledge-Based System, 2011, 24:1151-1158.
[8] Liu J, Li X, Zhong W. Ambiguous decision trees for mining concept-drifting data streams[J]. Pattern Recognition Letters, 2009:1347-1355.
[9] Ahmadi Z, Kramer S. Prototype-based learning on concept-drifting data streams[J]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014:412-421.
[10] Sidhu P, Bhatia M P S. A novel online ensemble approach to handle concept drifting data streams-diversifed dynamic weighted majority[J]. International Journal of Machine Learning & Cybernetics, 2015:1-25.
[11] Liu S, Sun Z, Liu T. Research of incremental data stream classifcation based on sample uncertainty[J]. Journal of Chinese Computer Systems, 2015:193-196.
[12] Liang C, Zhang Y, Shi P, Hu Z. Learning accurate very fast decision trees from uncertain data streams[J]. International Journal of Systems Science, 2014:1-19.
[13] Lü Y, Wang C R, Wang C, Yuan Y. Online classifcation algorithm for uncertain data stream in big data[J]. Journal of Northeastern University (Natural Science), 2016, 37(9):1245-1249.
[14] Hoeffding W. Probability inequalities for sums of bounded random variables[J]. Journal of the American Statistical Association, 1962:13-30.
[15] Qin B, Xia Y, Li F. Dtu:a decision tree for uncertain data[J]. Lecture Notes in Computer Science, 2009:4-15.
[16] He J, Zhang Y, Shi X L P. Learning naive Bayes classifers from positive and unlabelled examples with uncertainty[J]. International Journal of Systems Science, 2012:1805-1825.
[17] West D H D. Updating mean and variance estimates:an improved method[J]. Communication of ACM, 1979:532-535.

Outlines

/