Journal of Applied Sciences ›› 2017, Vol. 35 ›› Issue (5): 559-569.doi: 10.3969/j.issn.0255-8297.2017.05.003

• Selected Papers Presented at 2016 Congress of Computer Applications, China • Previous Articles     Next Articles

Data Stream Classifcation with Data Uncertainty and Concept Drift

LÜ Yan-xia1,2, WANG Cui-rong1,2, WANG Cong2, YUAN Ying2   

  1. 1. College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China;
    2. School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Northeastern University, Qinhuangdao 066004, Hebei Province, China
  • Received:2016-10-05 Revised:2017-02-27 Online:2017-09-30 Published:2017-09-30

Abstract:

Data in the Web have much uncertainty because of privacy protection, data loss, network errors, etc. In a data stream system, data arrive continuously and therefore one cannot obtain all data in any time. In addition, the concept drift often occurs in the data stream. This paper constructs an incremental classifcation model to deal with data stream classifcation with data uncertainty and concept drift. In this model, a fast decision tree algorithm is used. It can analyze uncertain information quickly and effectively both in the learning stage and the classifcation stage. In the learning stage, it uses the Hoeffding bound theory to quickly construct a decision tree model for the data stream with data uncertainty. In the classifcation stage, it uses a weighted Bayes classifer in the tree leaves to improve precision of the classifcation. The use of a sliding window to replace the tree ensures that the algorithm can deal with concept drift. Experimental results show that the algorithm has good classifcation accuracy and execution efciency both on artifcial and real data.

Key words: concept drift, classifcation, data stream, data uncertainty, decision tree

CLC Number: