Communication Engineering

Gaussian Mixture Model Convolution Neural Network Based on Imbalanced Problem

Expand
  • 1. School of Information, Shanghai Ocean University, Shanghai 201306, China;
    2. School of Information, Shanghai Jianqiao University, Shanghai 201306, China;
    3. Shanghai Film Academy, Shanghai University, Shanghai 200072, China

Received date: 2021-09-25

  Online published: 2023-08-02

Abstract

Imbalanced data classification is a challenging task in big data mining. The distribution of imbalanced data seriously affects the classification performance of models, especially for minority classes. In this paper, an expectation-maximum weighted resampling (EMWRS) algorithm and weighted cross entropy Loss (WCELoss) function are proposed to improve the classification performance of imbalanced data. The proposed approach utilizes a Gaussian mixture model to preprocess the data and employs weighted sampling and cost-sensitive learning to construct a convolutional neural network model. The constructed convolutional neural network is evaluated using F1 and G-mean as indicators, and compared with various classic algorithms such as SMOTE (synthetic minor over sampling technique) and ADASYN (adaptive synthetic sampling) on the adult datasets of UCI (university of California irvine). The experimental results demonstrate that the proposed model outperforms ADASYN and other classical algorithms in terms of F1 and G-mean on UCI adult datasets, which indicates that the proposed model effectively enhances the accuracy of minority classification.

Cite this article

XU Hong, JIAO Guie, ZHANG Wenjun . Gaussian Mixture Model Convolution Neural Network Based on Imbalanced Problem[J]. Journal of Applied Sciences, 2023 , 41(4) : 657 -668 . DOI: 10.3969/j.issn.0255-8297.2023.04.010

References

[1] Mishra A, Ghorpade C. Credit card fraud detection on the skewed data using various classification and ensemble techniques[C]//2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 2018:1-5.
[2] Wang L D, Lin Z Q, Wong A. COVID-Net:a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images[J]. Scientific Reports, 2020, 10(1):1-12.
[3] Ullah I, Raza B, Malik A K, et al. A churn prediction model using random forest:analysis of machine learning techniques for churn prediction and factor identification in telecom sector[J]. IEEE Access, 2019, 7:60134-60149.
[4] Randhawa K, Loo C K, Seera M, et al. Credit card fraud detection using AdaBoost and majority voting[J]. IEEE Access, 2018, 6:14277-14284.
[5] Błaszczyński J, De Almeida Filho A T, Matuszyk A, et al. Auto loan fraud detection using dominance-based rough set approach versus machine learning methods[J]. Expert Systems with Applications, 2021, 163:113740.
[6] Guo H X, Li Y J, Shang J, et al. Learning from class-imbalanced data:review of methods and applications[J]. Expert Systems with Applications, 2017, 73:220-239.
[7] Gong L N, Jiang S J, Jiang L. Tackling class imbalance problem in software defect prediction through cluster-based over-sampling with filtering[J]. IEEE Access, 2019, 7:145725-145737.
[8] Zhu Z H, Wang Z, Li D D, et al. Geometric structural ensemble learning for imbalanced problems[J]. IEEE Transactions on Cybernetics, 2020, 50(4):1617-1629.
[9] Niu L, Wan J W, Wang H Y, et al. Cost-sensitive dictionary learning for software defect prediction[J]. Neural Processing Letters, 2020, 52(3):2415-2449.
[10] Pan T T, Zhao J H, Wu W, et al. Learning imbalanced datasets based on SMOTE and Gaussian distribution[J]. Information Sciences, 2020, 512:1214-1233.
[11] Ning Q, Zhao X W, Ma Z Q. A novel method for identification of glutarylation sites combining borderline-SMOTE with Tomek links technique in imbalanced data[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, 19(5):2632-2641.
[12] Satapathy S K, Mishra S, Mallick P K, et al. ADASYN and ABC-optimized RBF convergence network for classification of electroencephalograph signal[J]. Personal and Ubiquitous Computing, 2021:1-17.
[13] Chen X, Yu G X, Tan Q Y, et al. Weighted samples based semi-supervised classification[J]. Applied Soft Computing, 2019, 79:46-58.
[14] Tian Y H. Artificial intelligence image recognition method based on convolutional neural network algorithm[J]. IEEE Access, 2020, 8:125731-125744.
[15] Alkhayrat M, Aljnidi M, Aljoumaa K. A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA[J]. Journal of Big Data, 2020, 7(1):1-23.
[16] Gao T Z, Gao Y F, Li Y, et al. Revisiting knowledge distillation for light-weight visual object detection[J]. Transactions of the Institute of Measurement and Control, 2021, 43(13):2888-2898.
[17] Zheng W J, Zhao H. Cost-sensitive hierarchical classification for imbalance classes[J]. Applied Intelligence, 2020, 50(8):2328-2338.
[18] Pasupa K, Vatathanavaro S, Tungjitnob S. Convolutional neural networks based focal loss for class imbalance problem:a case study of canine red blood cells morphology classification[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 56(4):1-17.
[19] 柴文光, 李嘉怡. 重加权在多类别不平衡医学图像检测中的应用[J]. 计算机工程与应用, 2022, 58(8):237-242. Chai W G, Li J Y. Application of re-weight method in multiple class-imbalance medical images detection[J]. Computer Engineering and Applications, 2022, 58(8):237-242. (in Chinese)
[20] Le T, Vo M T, Vo B, et al. A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction[J]. Complexity, 2019, 2019:1-12.
[21] Ren X X, Xing Z C, Xia X, et al. Neural network-based detection of self-admitted technical debt[J]. ACM Transactions on Software Engineering and Methodology, 2019, 28(3):1-45.
Outlines

/