应用科学学报 ›› 2021, Vol. 39 ›› Issue (4): 545-558.doi: 10.3969/j.issn.0255-8297.2021.04.003

• CCF NCCA 2020专辑 • 上一篇    

基于交互信息的混合特征选择算法

姜文煊, 段友祥, 孙歧峰   

  1. 中国石油大学 (华东) 计算机科学与技术学院, 山东 青岛 266580
  • 收稿日期:2020-08-28 发布日期:2021-08-04
  • 通信作者: 段友祥,教授,研究方向为网络与服务计算、计算机技术在油气领域的应用。E-mail:yxduan@upc.edu.cn E-mail:yxduan@upc.edu.cn
  • 基金资助:
    国家科技重大专项基金(No.2017ZX05009-001,No.2016ZX05011-002);中央高校基本科研业务费项目基金(No.18CX02020A)资助

Hybrid Feature Selection Algorithm Based on Mutual Information

JIANG Wenxuan, DUAN Youxiang, SUN Qifeng   

  1. School of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, Shandong, China
  • Received:2020-08-28 Published:2021-08-04

摘要: 针对传统的特征选择算法只专注于特征间的相关性和冗余性而没有考虑特征之间交互作用的问题,提出一种基于交互信息的混合特征选择(hybrid feature selection based onmutual information,MIHFS)算法,该算法以K-最近邻算法的分类准确率作为衡量所选特征分类性能的评价指标,有效地去除了冗余和不相关的特征,保留了具有交互作用的特征。为了评估该算法的性能,从分类准确率、所选特征数量以及算法稳定性三方面,与最大相关最小冗余、联合互信息等7种特征选择算法在8个数据集上进行了实验比较和分析。实验结果表明:MIHFS算法具有较强的稳定性,不仅有效降低了特征空间的维数,而且在所选特征的分类性能方面明显优于其他特征选择算法。最后将MIHFS算法与灰色关联分析法-逼近理想解的排序技术法相结合并应用到高邮凹陷永安地区戴一段地质评价中,其评价结果准确率为80%,与实际钻探结果基本吻合,具有较高的可靠性,能够有效指导油气地质评价。

关键词: 特征选择, 交互信息, 混合特征选择, K-最近邻, 灰色关联分析法, 逼近理想解的排序技术

Abstract: Traditional feature selection algorithms only focus on feature correlation and feature redundancy without considering the interaction between features. This paper proposes a hybrid feature selection based on mutual information (MIHFS) algorithm. The algorithm takes the classification accuracy of K-nearest neighbor (KNN) algorithm as evaluation index to evaluate the classification performance of selected features, effectively removes redundant and irrelevant features, and retains the interactive features. In order to evaluate the performance of the proposed algorithm, the classification accuracy, the number of selected features and the stability of the algorithm are compared with seven other feature selection algorithms such as minimal redundancy maximal relevance (mRMR) and joint mutual information (JMI) in eight datasets. Experimental results show that the MIHFS algorithm has strong stability, which not only effectively reduces the dimension of feature space, but also has better classification performance than other feature selection algorithms. Finally, in combination with grey relation analysis (GRA) method-technique for order preference by similarity to ideal solution (TOPSIS) method, MIHFS algorithm is applied to the geological evaluation of the first member of Dainan Formation at Yong’an Area, Gaoyou Sag. Experimental results show that MIHFS algorithm performs an evaluation accuracy of 80% with high reliability, and this is basically consistent with actual drilling results and proves the effectiveness of MIHFS in oil and gas geological evaluation.

Key words: feature selection, interactive information, hybrid feature selection, K-nearest neighbor (KNN), gray relation analysis (GRA) method, technique for order preference by similarity to ideal solution (TOPSIS)

中图分类号: