Journal of Applied Sciences ›› 2021, Vol. 39 ›› Issue (4): 545-558.doi: 10.3969/j.issn.0255-8297.2021.04.003

• Special Issue on CCF NCCA 2020 • Previous Articles    

Hybrid Feature Selection Algorithm Based on Mutual Information

JIANG Wenxuan, DUAN Youxiang, SUN Qifeng   

  1. School of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, Shandong, China
  • Received:2020-08-28 Published:2021-08-04

Abstract: Traditional feature selection algorithms only focus on feature correlation and feature redundancy without considering the interaction between features. This paper proposes a hybrid feature selection based on mutual information (MIHFS) algorithm. The algorithm takes the classification accuracy of K-nearest neighbor (KNN) algorithm as evaluation index to evaluate the classification performance of selected features, effectively removes redundant and irrelevant features, and retains the interactive features. In order to evaluate the performance of the proposed algorithm, the classification accuracy, the number of selected features and the stability of the algorithm are compared with seven other feature selection algorithms such as minimal redundancy maximal relevance (mRMR) and joint mutual information (JMI) in eight datasets. Experimental results show that the MIHFS algorithm has strong stability, which not only effectively reduces the dimension of feature space, but also has better classification performance than other feature selection algorithms. Finally, in combination with grey relation analysis (GRA) method-technique for order preference by similarity to ideal solution (TOPSIS) method, MIHFS algorithm is applied to the geological evaluation of the first member of Dainan Formation at Yong’an Area, Gaoyou Sag. Experimental results show that MIHFS algorithm performs an evaluation accuracy of 80% with high reliability, and this is basically consistent with actual drilling results and proves the effectiveness of MIHFS in oil and gas geological evaluation.

Key words: feature selection, interactive information, hybrid feature selection, K-nearest neighbor (KNN), gray relation analysis (GRA) method, technique for order preference by similarity to ideal solution (TOPSIS)

CLC Number: