应用科学学报 ›› 2023, Vol. 41 ›› Issue (4): 657-668.doi: 10.3969/j.issn.0255-8297.2023.04.010

• 通信工程 • 上一篇    下一篇

基于非平衡问题的高斯混合模型卷积神经网络

徐红1, 矫桂娥2,3, 张文俊3   

  1. 1. 上海海洋大学 信息学院, 上海市 201306;
    2. 上海建桥学院 信息学院, 上海市 201306;
    3. 上海大学 上海电影学院, 上海市 200072
  • 收稿日期:2021-09-25 发布日期:2023-08-02
  • 通信作者: 矫桂娥,副教授,研究方向为数字媒体及应用、大数据分析及可视化。E-mail:jiaoguie@gench.edu.cn E-mail:jiaoguie@gench.edu.cn
  • 基金资助:
    校级重点科研项目(No. sjq17007);江苏省研究生科研与实践创新基金(No. SJCX20_1352)资助

Gaussian Mixture Model Convolution Neural Network Based on Imbalanced Problem

XU Hong1, JIAO Guie2,3, ZHANG Wenjun3   

  1. 1. School of Information, Shanghai Ocean University, Shanghai 201306, China;
    2. School of Information, Shanghai Jianqiao University, Shanghai 201306, China;
    3. Shanghai Film Academy, Shanghai University, Shanghai 200072, China
  • Received:2021-09-25 Published:2023-08-02

摘要: 为了提升分类模型对非平衡数据的分类性能,提出一种EMWRS(expectationmaximization weighted resampling)抽样算法和WCELoss(weighted cross entropy lossfunction)损失函数,在数据预处理阶段采用高斯混合模型得知数据分布特点,根据其聚类结果分析每个聚类簇中样本权重,以及样本分布和对应权重对数据进行采样,降低数据集不平衡程度;再依据样本比例权重对少数类和多数类赋予不同的代价损失,构建卷积神经网络模型,提高非平衡数据集的分类准确性。构建的卷积神经网络以F1和G-mean为评价指标,在UCI(university of California irvine)公共数据集adult上与SMOTE(synthetic minorityover-sampling technique)和ADASYN(adaptive synthetic sampling)等多种经典算法进行比较,结果显示在这两种评价指标中所提模型均为第一,这表明改进后的卷积神经网络模型能够很好地提高少数类分类正确率。

关键词: 非平衡数据, 高斯混合模型, 样本加权, 代价损失, 卷积神经网络

Abstract: Imbalanced data classification is a challenging task in big data mining. The distribution of imbalanced data seriously affects the classification performance of models, especially for minority classes. In this paper, an expectation-maximum weighted resampling (EMWRS) algorithm and weighted cross entropy Loss (WCELoss) function are proposed to improve the classification performance of imbalanced data. The proposed approach utilizes a Gaussian mixture model to preprocess the data and employs weighted sampling and cost-sensitive learning to construct a convolutional neural network model. The constructed convolutional neural network is evaluated using F1 and G-mean as indicators, and compared with various classic algorithms such as SMOTE (synthetic minor over sampling technique) and ADASYN (adaptive synthetic sampling) on the adult datasets of UCI (university of California irvine). The experimental results demonstrate that the proposed model outperforms ADASYN and other classical algorithms in terms of F1 and G-mean on UCI adult datasets, which indicates that the proposed model effectively enhances the accuracy of minority classification.

Key words: imbalance data, Gaussian mixture model, sample weighting, cost loss, convolutional neural network

中图分类号: