收稿日期: 2016-05-16
修回日期: 2016-09-20
网络出版日期: 2017-03-30
基金资助
国家自然科学基金(No.61300170,No.71371012);安徽省高校优秀人才重点项目基金(No.2013SQRL034ZD);安徽省自然科学基金(No.1608085MF147);安徽省教育厅一般项目基金(No.TSKJ2014B10,No.TSKJ2016B05)资助
Data Stream Ensemble Classification Based on Classifier Confidence
Received date: 2016-05-16
Revised date: 2016-09-20
Online published: 2017-03-30
刘三民, 刘涛, 王忠群, 修宇, 刘余霞, 孟超 . 融合分类器可信度的数据流集成分类[J]. 应用科学学报, 2017 , 35(2) : 226 -232 . DOI: 10.3969/j.issn.0255-8297.2017.02.009
A weight computation policy based on confidence is presented to deal with the problem in the sub-classifier's weight in dynamic data stream ensemble classification. The policy fully considers influence of the sample on the weight of the sub-classifier. Uncertainty of the prediction result is described by information entropy, and relationship between the classifier confidence and the samples established. Thus, the computation method of classifier's confidence is defined. According to the requirements of dynamic data stream classification and traits of concept drift, a dynamic weight ensemble model is built by batch learning and time policy. Theoretical analysis and experimental results show feasibility of the presented schema, which is better than traditional methods.
Key words: ensemble learning; confidence; data stream classification; concept drift
[1] 孙大为,张广艳,郑纬民. 大数据流式计算:关键技术及系统实例[J]. 软件学报,2014, 25(4):839-862. Sun D W, Zhang G Y, Zheng W M. Big data stream computing:technologies and instances[J]. Journal of Software, 2014, 25(4):839-862. (in Chinese)
[2] 郭躬德, 李南, 陈黎飞. 一种基于混合模型的数据流概念漂移检测算法[J]. 计算机研究与发展, 2014, 51(4):731-742. Guo G D, Li N, Chen L F. Concept drift detection for data streams based on mixture model[J]. Journal of Computer Research and Development, 2014, 51(4):731-742. (in Chinese)
[3] Alippi C, Boracchi G, Roveri M. An effective just-in-time adaptive classifier for gradual concept drifts[C]//Proceedings of the 2011 International Joint Conference on Neural Networks, 2011:1675-1681.
[4] 刘三民,孙知信. 具有概念漂移的P2P流量识别研究[J]. 系统工程与电子技术,2013, 35(4):864-869. Liu S M, Sun Z X. Research of traffic identification in P2P network with concept drift[J]. Systems Engineering and Electronics, 2013, 35(4):864-869. (in Chinese)
[5] Street W N, Kim Y S. A streaming ensemble algorithm for large scale classification[C]//Proceeding of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001:377-382.
[6] 孙岳,毛国君,刘旭,刘椿. 基于多分类器的数据流中的概念漂移挖掘[J]. 自动化学报,2008, 34(1):93-96. Sun Y, Mao G J, Liu X, Liu C. Mining concept drifts from data streams based on multiclassifiers[J]. Acta Automatica Sinica, 2008, 34(1):93-96. (in Chinese)
[7] Flwell R, Polikar R. Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22(10):1517-1531.
[8] Ghazikhani A, Monsefi R, Yazdi H S. Ensemble of online neural networks for non-stationary and imbalanced data streams[J]. Neurocomputing, 2013, 6:1-10.
[9] Wang S, Minku L L, Yao X. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5):1356-1367.
[10] Farid D M, Li Z, Hossain A, Rahman C M, Strachan R, Sexton G, Dahal K. An adaptive ensemble classifier for mining concept drifting data streams[J]. Expert Systems with Applications, 2013, 40(15):5895-5906.
[11] Zhang P, Zhu X Q, Shi Y, Guo L, Wu X D. Robust ensemble learning for mining noisy data streams[J]. Decision Support Systems, 2011, 50:469-479.
[12] Li P P, Wu X D, Hu X G, Wang H. Learning concept-drifting data streams with random ensemble decision trees[J]. Neurocomputing, 2015, 166:68-83.
[13] Pan S R, Wu J, Zhu X Q, Zhang C Q. Graph ensemble boosting for imbalanced noisy graph stream classification[J]. IEEE Transaction on Cybernetics, 2015, 45(5):940-954.
[14] Sun Y, Tang K, Mink L L, Wang S, Yao X. Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6):1532-1545.
[15] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA:massive online analysis[J]. Journal of Machine Learning Research, 2010, 11(2):1601-1604.
/
| 〈 |
|
〉 |