针对不平衡数据集中少数类样本分类识别率较低的问题,提出一种基于代价敏感卷积神经网络(cost sensitive convolutional neural network,CSCNN)和AdaBoost的分类算法(classification algorithm based on cost sensitive convolutional neural network and AdaBoost,AdaBoost-CSCNN)。设置特定的代价敏感指标来协同卷积神经网络的交叉熵损失函数,从而构建CSCNN。在训练过程中,借助代价赋权机制降低少数类样本关键特征属性的损失度,实现单个CSCNN作为基分类器在AdaBoost中的分类效果。为验证算法的有效性,使用Accuracy、Recall、F1值和AUC这4个评价指标在9个具有不同不平衡率的数据集上开展实验。结果表明,AdaBoost-CSCNN算法处理不平衡数据集分类问题有较好的显示度。
Aiming at the problem of low recognition rate of a few types of samples in unbalanced data sets, a classification algorithm based on cost sensitive convolutional neural network and AdaBoost (AdaBoost-CSCNN) was proposed. The cost sensitive convolutional neural network (CSCNN) is constructed by coordinating the cross entropy loss function of convolutional neural network (CNN) with a specific cost sensitive index. In training process, cost weighting mechanism is used to reduce the loss degree of key feature attributes of a few samples and realize the classification effect of a single CSCNN as a base classifier in AdaBoost. To verify the effectiveness of the algorithm, we carried out experiments on 9 data sets with different imbalance rates. Experimental performances, including Accuracy, Recall, F1-score and AUC, show that the AdaBoost-CSCNN algorithm has a good display for unbalanced data set classification.
[1] Chawla N V, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1):1-6.
[2] Mazurowski M A, Habas P A, Zurada J M, et al. Training neural network classifiers for medical decision making:the effects of imbalanced datasets on classification performance[J]. Neural Networks, 2008, 21(2/3):427-436.
[3] 于巧, 姜淑娟, 张艳梅, 等. 分类不平衡对软件缺陷预测模型性能的影响研究[J]. 计算机学报, 2018, 41(4):809-824. Yu Q, Jiang S J, Zhang Y M, et al. The impact study of class imbalance on the performance of software defect prediction models[J]. Chinese Journal of Computers, 2018, 41(4):809-824. (in Chinese)
[4] Gamage S, Samarabandu J. Deep learning methods in network intrusion detection:a survey and an objective comparison[J]. Journal of Network and Computer Applications, 2020, 169(2):102767.
[5] 李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4):673-688. Li Y X, Chai Y, Hu Y Q, et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34(4):673-688. (in Chinese)
[6] Zheng W, Zhao H. Cost-sensitive hierarchical classification for imbalance classes[J]. Applied Intelligence, 2020, 50(1):1-11.
[7] Peng P, Zhang W J, Zhang Y, et al. Cost sensitive active learning using bidirectional gated recurrent neural networks for imbalanced fault diagnosis[J]. Neurocomputing, 2020, 407:232-245.
[8] 万建武, 杨明. 代价敏感学习方法综述[J]. 软件学报, 2020, 31(1):113-136. Wan J W, Yang M. Survey on cost-sensitive learning method[J]. Journal of Software, 2020, 31(1):113-136. (in Chinese)
[9] Yoav F, Robert E S. A decision-theoretic generalization of on-line learning and an application to Boosting[J]. Journal of Computer and System Sciences, 1997, 55(1):119-139.
[10] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16:321-357.
[11] Seiffert C, Khoshgoftaar T M, Van H J, et al. RUSBoost:a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 2009, 40(1):185-197.
[12] Fan W, Stolfo S J, Zhang J. AdaCost:misclassification cost-sensitive boosting[J]. Proc. International Conference on Machine Learning, 1999:97-105.
[13] 王忠震, 黄勃, 方志军, 等. 改进SMOTE的不平衡数据集成分类算法[J]. 计算机应用, 2019, 39(9):2591-2596. Wang Z Z, Huang B, Fang Z J, et al. Imporved SMOTE unbalanced data integration classification algorithm[J]. Journal of Computer Applications, 2019, 39(9):2591-2596. (in Chinese)
[14] Chen F, Cheng M, Tang B, et al. Pattern recognition of a sensitive feature set based on the orthogonal neighborhood preserving embedding and Adaboost_SVM algorithm for rolling bearing early fault diagnosis[J]. Measurement Science and Technology, 2020, 31(10):105007.
[15] 闻佳, 王宏君, 邓佳, 等. 基于深度学习的异常事件检测[J]. 电子学报, 2020, 48(2):308-313. Wen J, Wang H J, Deng J, et al. Abnormal event detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48(2):308-313. (in Chinese)
[16] Frazao X, Alexandre L A. Weighted convolutional neural network ensemble[C]//Iberoamerican Congress on Pattern Recognition. Cham:Springer, 2014:674-681.
[17] Taherkhani A, Cosma G, Mcginnity T M. AdaBoost-CNN:an adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning[J]. Neurocomputing, 2020, 404:351-366.
[18] 董勋, 郭亮, 高宏力, 等. 代价敏感卷积神经网络:一种机械故障数据不平衡分类方法[J]. 仪器仪表学报, 2019, 40(12):205-213. Dong X, Guo L, Gao H L, et al. Cost sensitive convolutional neural network:a classification method for imbalanced data of mechanical fault[J]. Chinese Journal of Scientific Instrument, 2019, 40(12):205-213. (in Chinese)
[19] Donovan F, Talayeh R. A cost-sensitive convolution neural network learning for control chart pattern recognition[J]. Expert Systems with Applications, 2020, 150(C):113275.
[20] Tan C, Sun F, Kong T, et al. A survey on deep transfer learning[C]//International Conference on Artificial Neural Networks. Cham:Springer, 2018:270-279.