基于Stacking集成学习的流失用户预测方法

doi:10.3969/j.issn.0255-8297.2020.06.011

应用科学学报 ›› 2020, Vol. 38 ›› Issue (6): 944-954.doi: 10.3969/j.issn.0255-8297.2020.06.011

• 信号与信息处理 • 上一篇

基于Stacking集成学习的流失用户预测方法

郑红¹, 叶成¹, 金永红^1,2, 程云辉¹

1. 华东理工大学信息科学与工程学院, 上海 200237;
2. 上海师范大学商学院, 上海 200234

收稿日期:2019-06-21 发布日期:2020-12-08
通信作者: 郑红,博士,副教授,研究方向为形式化方法、机器学习.E-mail:zhenghong@ecust.edu.cn E-mail:zhenghong@ecust.edu.cn
基金资助:
国家自然科学基金（No.61103115，No.61103172）；上海市科委科技创新行动计划高新技术领域项目基金（No.16511101000）资助

Customer Churn Prediction Method Based on Stacking Ensemble Learning

ZHENG Hong¹, YE Cheng¹, JIN Yonghong^1,2, CHENG Yunhui¹

1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
2. School of Finance and Business, Shanghai Normal University, Shanghai 200234, China

Received:2019-06-21 Published:2020-12-08

摘要/Abstract

摘要： 利用机器学习算法对商业活动中普遍存在的客户流失问题进行预测.借鉴了Bagging的自助采样法思想，提出了一种基于自助采样法的Stacking集成方法.首先对数据集进行多次采样并加入属性扰动，然后使用所得数据子集训练基分类器副本，基分类器决策结果由基分类器所对应的副本投票决定.最后在真实数据集中进行流失客户预测实验，结果显示，该文提出的方法在准确率、查准率和F1值3项指标上均好于所有基分类器和同结构的经典Stacking集成方法.

关键词: Stacking集成学习, 用户流失预测, 自助采样法, 机器学习

Abstract: The machine learning algorithm is used to predict the customer loss problem in business activities. Inspired by the idea of Bagging ensemble method, we proposed a Stacking ensemble learning based on bootstrap sampling. By multiple bootstrap sampling of the data set and adding attribute disturbance, multiple copies of the base classifier are trained with the data subset, and the decision result of the base classifier is determined by the vote of the corresponding copy of the base classifier. Experimental results show that the method we proposed in this paper has better performance than all base classifiers and the classical Stacking ensemble method of the same structure in terms of accuracy, precision rate and F1-score.

Key words: Stacking ensemble learning, customer churn prediction, bootstrap sampling, machine learning

中图分类号:

TP391.4

郑红, 叶成, 金永红, 程云辉. 基于Stacking集成学习的流失用户预测方法[J]. 应用科学学报, 2020, 38(6): 944-954.

ZHENG Hong, YE Cheng, JIN Yonghong, CHENG Yunhui. Customer Churn Prediction Method Based on Stacking Ensemble Learning[J]. Journal of Applied Sciences, 2020, 38(6): 944-954.

参考文献

[1] Mohammed H, Ali T, Tariq E, et al. Customer churn in mobile markets:a comparison of techniques[J]. International Business Research, 2015, 8(6):224-237.
[2] Soltani Z, Navimipour N J. Customer relationship management mechanisms:a systematic review of the state of the art literature and recommendations for future research[J]. Computers in Human Behavior, 2016, 61:667-688.
[3] Gillies C, Rigby D, Reichheld F. The story behind successful customer relations management[J]. European Business Journal, 2002, 14(2):73-77.
[4] Tiwari A, Sam R, Shaikh S. Analysis and prediction of churn customers for telecommunication industry[C]//2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud). IEEE, 2017:218-222.
[5] 于瑞云, 薛林, 安轩邈, 等. 基于改进GA-BP的移动通信用户流失预测算法[J]. 东北大学学报(自然科学版), 2019, 40(2):180-185. Yu R Y, Xue L, An X M, et al. Mobile communications customer churn prediction algorithm based on improved GA-BP network[J]. Journal of Northeastern University (Natural Science), 2019, 40(2):180-185. (in Chinese)
[6] Dalvi P K, Khandge S K, Deomore A, et al. Analysis of customer churn prediction in telecom industry using decision trees and logistic regression[C]//2016 Symposium on Colossal Data Analysis and Networking (CDAN). IEEE, 2016:1-4.
[7] Qiu Y F, Li C. Research on E-commerce user churn prediction based on logistic regression[C]//2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2017:87-91.
[8] 朱姗姗. 数据挖掘在电信行业客户细分的应用研究[D]. 辽宁:辽宁大学, 2012.
[9] Farquad M A H, Ravi V, Raju S B. Churn prediction using comprehensible support vector machine:an analytical CRM application[J]. Applied Soft Computing, 2014, 19:31-40.
[10] Kisioglu P, Topcu Y I. Applying Bayesian belief network approach to customer churn analysis:a case study on the telecom industry of Turkey[J]. Expert Systems with Applications, 2011, 38(6):7151-7157.
[11] Artit W, Cyrille B, Rujikorn P. Churn analysis using deep convolutional neural networks and autoencoders[DB/OL].[2016]. https://arxiv.org/pdf/1703.02596.pdf
[12] Martins H. Predicting user churn on streaming services using recurrent neural networks[D]. KTH Royal Institute of Technology, 2017.
[13] Liu X, Dai Y, Zhang Y, et al. A preprocessing method of AdaBoost for mislabeled data classification[C]//2017 29th Chinese Control and Decision Conference (CCDC). IEEE, 2017:2738-2742.
[14] 章品正, 王健弘. 一种应用机器学习的车牌定位方法[J]. 应用科学学报, 2011, 29(2):147-152. Zhang P Z, Wang J H. Vehicle license plate location based on machine learning[J]. Journal of Applied Sciences, 2011, 29(2):147-152. (in Chinese)
[15] Olson M A, Wyner A J. Making sense of random forest probabilities:a kernel perspective[DB/OL].[2018]. https://arxiv.org/abs/1812.05792?context=stat.ML
[16] Divina F, Gilson A, GomÉZ-Vela F, et al. Stacking ensemble learning for short-term electricity consumption forecasting[J]. Energies, 2018, 11(4):949-980.
[17] Zheng H, Li H, Lu X, et al. A multiple kernel learning approach for air quality prediction[J]. Advances in Meteorology, 2018, 2018:1-15.

基于Stacking集成学习的流失用户预测方法

Customer Churn Prediction Method Based on Stacking Ensemble Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

[1]	欧阳志友, 陈晨, 王愉茜, 陈金刚, 殷昭, 周青松. 基于自然语言处理的蛋白质小分子亲和力值预测[J]. 应用科学学报, 2019, 37(3): 327-335.
[2]	彭建芬1;2，周亚建1;2，王枞1;2，杨义先1;2，平源1;2. TCP流量早期识别方法[J]. 应用科学学报, 2011, 29(1): 73-77.