应用科学学报 ›› 2013, Vol. 31 ›› Issue (6): 628-632.doi: 10.3969/j.issn.0255-8297.2013.06.012

• 计算机科学与应用 • 上一篇    下一篇

基于频繁模式的选择性集成

周红芳1, 王啸1, 赵雪涵1, 饶元2   

  1. 1. 西安理工大学计算机科学与工程学院,西安710048
    2. 西安交通大学软件学院,西安710049
  • 收稿日期:2013-04-21 修回日期:2013-06-06 出版日期:2013-11-29 发布日期:2013-06-06
  • 作者简介:周红芳,博士,副教授,研究方向:数据仓库与数据挖掘、知识发现、粗糙集,E-mail: zhouhf@xaut.edu.cn
  • 基金资助:

    国家自然科学基金(No.61172124); 陕西省教育厅科学研究计划基金(No.12JK0739); 西安市科学计划项目基金(No.CXY1339(5));西安市碑林区科技计划项目基金(No.GX1308);西安理工大学特色研究计划项目基金(No.116-211302)资助

Ensemble Pruning Based on Frequent Patterns

ZHOU Hong-fang1, WANG Xiao1, ZHAO Xue-han1, RAO Yuan2   

  1. 1. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
    2. School of Soft Engineering, Xi’an Jiaotong University, Xi’an 710049, China
  • Received:2013-04-21 Revised:2013-06-06 Online:2013-11-29 Published:2013-06-06

摘要: 针对集成学习方法在处理大规模数据集时具有计算复杂度高、基分类器数目多、分类精度不理想的问题,提出一种基于频繁模式的选择性集成算法. 该算法利用频繁模式挖掘的原理,将未剪枝的集成分类器和样本空间映射为事务数据库,并利用布尔矩阵存储分类结果,然后从中挖掘频繁基分类器组成最终的集成分类器,达到选择性集成的目的. 实验结果表明,与集成分类算法Bagging、AdaBoost、WAVE 和RFW 相比,该算法减小了集成分类器的规模,提高了集成分类器的分类精度和分类效率.

关键词: 大规模数据集, 频繁模式, 选择性集成, 事务数据库, 布尔矩阵

Abstract: Most ensemble learning methods have high computational complexity, excessive base classifiers and unsatisfactory classification accuracy in case of large-scale data sets. This paper proposes an ensemble pruning algorithm based on frequent patterns. Using the theory of frequent patterns mining, the method
maps the un-pruned ensemble classifier and corresponding sample space to a transactional database, and stores the corresponding classification results in a boolean matrix. After extracting frequent base classifiers from the Boolean matrix and composing a pruning ensemble, the algorithm gives the final pruning ensemble.Experimental results show that this algorithm reduces the number of base classifiers, improves classification accuracy and increases classification efficiency compared with ensemble algorithms of Bagging, AdaBoost, WAVE and RFW.

Key words: large-scale data set, frequent pattern, ensemble pruning, transactional database, Boolean matrix

中图分类号: