应用科学学报 ›› 2017, Vol. 35 ›› Issue (2): 226-232.doi: 10.3969/j.issn.0255-8297.2017.02.009

• 计算机科学与应用 • 上一篇    下一篇

融合分类器可信度的数据流集成分类

刘三民1, 刘涛1, 王忠群1, 修宇1, 刘余霞2, 孟超3   

  1. 1. 安徽工程大学 计算机与信息学院, 安徽 芜湖 241000;
    2. 安徽工程大学 现代教育中心, 安徽 芜湖 241000;
    3. 南京邮电大学 物联网学院, 南京 210003
  • 收稿日期:2016-05-16 修回日期:2016-09-20 出版日期:2017-03-30 发布日期:2017-03-30
  • 作者简介:刘三民,副教授,研究方向:机器学习、数据挖掘,E-mail:aqlsm@163.com
  • 基金资助:

    国家自然科学基金(No.61300170,No.71371012);安徽省高校优秀人才重点项目基金(No.2013SQRL034ZD);安徽省自然科学基金(No.1608085MF147);安徽省教育厅一般项目基金(No.TSKJ2014B10,No.TSKJ2016B05)资助

Data Stream Ensemble Classification Based on Classifier Confidence

LIU San-min1, LIU Tao1, WANG Zhong-qun1, XIU Yu1, LIU Yu-xia2, MENG Chao3   

  1. 1. College of Computer and Information, Anhui Polytechnic University, Wuhu 241000, Anhui Province, China;
    2. Modern Education Technology Center, Anhui Polytechnic University, Wuhu 241000, Anhui Province, China;
    3. College of the Internet of Things, Nanjing University of Posts and Communications, Nanjing 210003, China
  • Received:2016-05-16 Revised:2016-09-20 Online:2017-03-30 Published:2017-03-30

摘要:

提出基于分类器可信度的权重计算策略,解决动态数据流集成分类中子分类器权重分配难题.该方法充分考虑了所处不同位置样本对权重计算的影响,利用信息熵描述分类器对预测结果的不确定性,建立分类器可信度与样本之间的关系,进而给出分类器可信度的定量计算方法.最后结合动态数据流分类需求和概念漂移特点,借助批量学习和时间遗忘策略构建基于分类器可信度的动态加权集成分类模型.理论分析和实验结果表明该分类方案可行,相比传统集中方法具有一定的优势.

关键词: 数据流分类, 可信度, 概念漂移, 集成学习

Abstract:

A weight computation policy based on confidence is presented to deal with the problem in the sub-classifier's weight in dynamic data stream ensemble classification. The policy fully considers influence of the sample on the weight of the sub-classifier. Uncertainty of the prediction result is described by information entropy, and relationship between the classifier confidence and the samples established. Thus, the computation method of classifier's confidence is defined. According to the requirements of dynamic data stream classification and traits of concept drift, a dynamic weight ensemble model is built by batch learning and time policy. Theoretical analysis and experimental results show feasibility of the presented schema, which is better than traditional methods.

Key words: ensemble learning, confidence, data stream classification, concept drift

中图分类号: