融合分类器可信度的数据流集成分类

刘三民, 刘涛, 王忠群, 修宇, 刘余霞, 孟超

doi:10.3969/j.issn.0255-8297.2017.02.009

应用科学学报 >

2017 , Vol. 35 >Issue 2: 226 - 232

DOI: https://doi.org/10.3969/j.issn.0255-8297.2017.02.009

计算机科学与应用

融合分类器可信度的数据流集成分类

展开

1. 安徽工程大学计算机与信息学院, 安徽芜湖 241000;
2. 安徽工程大学现代教育中心, 安徽芜湖 241000;
3. 南京邮电大学物联网学院, 南京 210003

刘三民,副教授,研究方向:机器学习、数据挖掘,E-mail:aqlsm@163.com

收稿日期: 2016-05-16

修回日期: 2016-09-20

网络出版日期: 2017-03-30

基金资助

国家自然科学基金（No.61300170，No.71371012）；安徽省高校优秀人才重点项目基金（No.2013SQRL034ZD）；安徽省自然科学基金（No.1608085MF147）；安徽省教育厅一般项目基金（No.TSKJ2014B10，No.TSKJ2016B05）资助

收起

Data Stream Ensemble Classification Based on Classifier Confidence

Expand

1. College of Computer and Information, Anhui Polytechnic University, Wuhu 241000, Anhui Province, China;
2. Modern Education Technology Center, Anhui Polytechnic University, Wuhu 241000, Anhui Province, China;
3. College of the Internet of Things, Nanjing University of Posts and Communications, Nanjing 210003, China

Received date: 2016-05-16

Revised date: 2016-09-20

Online published: 2017-03-30

Fold

摘要

提出基于分类器可信度的权重计算策略，解决动态数据流集成分类中子分类器权重分配难题.该方法充分考虑了所处不同位置样本对权重计算的影响，利用信息熵描述分类器对预测结果的不确定性，建立分类器可信度与样本之间的关系，进而给出分类器可信度的定量计算方法.最后结合动态数据流分类需求和概念漂移特点，借助批量学习和时间遗忘策略构建基于分类器可信度的动态加权集成分类模型.理论分析和实验结果表明该分类方案可行，相比传统集中方法具有一定的优势.

关键词： 数据流分类; 可信度; 概念漂移; 集成学习

本文引用格式

刘三民, 刘涛, 王忠群, 修宇, 刘余霞, 孟超 . 融合分类器可信度的数据流集成分类[J]. 应用科学学报, 2017 , 35(2) : 226 -232 . DOI: 10.3969/j.issn.0255-8297.2017.02.009

Abstract

A weight computation policy based on confidence is presented to deal with the problem in the sub-classifier's weight in dynamic data stream ensemble classification. The policy fully considers influence of the sample on the weight of the sub-classifier. Uncertainty of the prediction result is described by information entropy, and relationship between the classifier confidence and the samples established. Thus, the computation method of classifier's confidence is defined. According to the requirements of dynamic data stream classification and traits of concept drift, a dynamic weight ensemble model is built by batch learning and time policy. Theoretical analysis and experimental results show feasibility of the presented schema, which is better than traditional methods.

Key words： ensemble learning; confidence; data stream classification; concept drift

参考文献

[1] 孙大为,张广艳,郑纬民. 大数据流式计算:关键技术及系统实例[J]. 软件学报,2014, 25(4):839-862. Sun D W, Zhang G Y, Zheng W M. Big data stream computing:technologies and instances[J]. Journal of Software, 2014, 25(4):839-862. (in Chinese)
[2] 郭躬德, 李南, 陈黎飞. 一种基于混合模型的数据流概念漂移检测算法[J]. 计算机研究与发展, 2014, 51(4):731-742. Guo G D, Li N, Chen L F. Concept drift detection for data streams based on mixture model[J]. Journal of Computer Research and Development, 2014, 51(4):731-742. (in Chinese)
[3] Alippi C, Boracchi G, Roveri M. An effective just-in-time adaptive classifier for gradual concept drifts[C]//Proceedings of the 2011 International Joint Conference on Neural Networks, 2011:1675-1681.
[4] 刘三民,孙知信. 具有概念漂移的P2P流量识别研究[J]. 系统工程与电子技术,2013, 35(4):864-869. Liu S M, Sun Z X. Research of traffic identification in P2P network with concept drift[J]. Systems Engineering and Electronics, 2013, 35(4):864-869. (in Chinese)
[5] Street W N, Kim Y S. A streaming ensemble algorithm for large scale classification[C]//Proceeding of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001:377-382.
[6] 孙岳,毛国君,刘旭,刘椿. 基于多分类器的数据流中的概念漂移挖掘[J]. 自动化学报,2008, 34(1):93-96. Sun Y, Mao G J, Liu X, Liu C. Mining concept drifts from data streams based on multiclassifiers[J]. Acta Automatica Sinica, 2008, 34(1):93-96. (in Chinese)
[7] Flwell R, Polikar R. Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22(10):1517-1531.
[8] Ghazikhani A, Monsefi R, Yazdi H S. Ensemble of online neural networks for non-stationary and imbalanced data streams[J]. Neurocomputing, 2013, 6:1-10.
[9] Wang S, Minku L L, Yao X. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5):1356-1367.
[10] Farid D M, Li Z, Hossain A, Rahman C M, Strachan R, Sexton G, Dahal K. An adaptive ensemble classifier for mining concept drifting data streams[J]. Expert Systems with Applications, 2013, 40(15):5895-5906.
[11] Zhang P, Zhu X Q, Shi Y, Guo L, Wu X D. Robust ensemble learning for mining noisy data streams[J]. Decision Support Systems, 2011, 50:469-479.
[12] Li P P, Wu X D, Hu X G, Wang H. Learning concept-drifting data streams with random ensemble decision trees[J]. Neurocomputing, 2015, 166:68-83.
[13] Pan S R, Wu J, Zhu X Q, Zhang C Q. Graph ensemble boosting for imbalanced noisy graph stream classification[J]. IEEE Transaction on Cybernetics, 2015, 45(5):940-954.
[14] Sun Y, Tang K, Mink L L, Wang S, Yao X. Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6):1532-1545.
[15] Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA:massive online analysis[J]. Journal of Machine Learning Research, 2010, 11(2):1601-1604.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献