Special Issue on CCF NCCA 2020

Accurately Identify Zombie Enterprises Based on Decision Tree-Logistic Regression Model

Expand
  • 1. School of Computer and Information, Hohai University, Nanjing 211100, Jiangsu, China;
    2. School of Business, Hohai University, Nanjing 211100, Jiangsu, China

Received date: 2020-08-26

  Online published: 2021-08-04

Abstract

Aiming at the problem of how to accurately identify zombie enterprises, based on the enterprise information data set published by Hunan Kechuang Information Co., LTD., a zombie enterprise identification method based on decision tree-logistic regression model is proposed. The method uses median to fill in missing numbers and outliers, analyzes data sets for feature derivation, and finally uses multiple linear regression and chi-square test to complete feature screening. In order to verify the effectiveness of the proposed method, comparative experiments are carried out between the method and the over-borrowing method, continuous loss method, random forest algorithm, BP neural network algorithm, and XGBoost algorithm in the Alibaba Cloud environment and the local environment. Each model is trained 50 times, the data selected for each training is randomly selected according to a certain proportion, and finally the average value of each index is taken as the final result. Experimental results show that the proposed decision tree-logistic regression model has the highest accuracy in the identification of zombie companies, reaching 99.98%, and the model is superior to various other integrated models in running speed with average execution time of about 1.5 s. In all scenarios, experimental results of this model show relatively small differences, verifying the effectiveness and stability of the model.

Cite this article

WU Dongpeng, WANG Zheng, TONG Wei, YE Feng, SONG Chuqiao . Accurately Identify Zombie Enterprises Based on Decision Tree-Logistic Regression Model[J]. Journal of Applied Sciences, 2021 , 39(4) : 569 -580 . DOI: 10.3969/j.issn.0255-8297.2021.04.005

References

[1] 凌梦媛. 僵尸企业处置方法研究[D]. 杭州:杭州电子科技大学, 2018.
[2] Du W J, Li M J. Can environmental regulation promote the governance of excess capacity in China's energy sector? the market exit of zombie enterprises[J]. Journal of Cleaner Production, 2019, 207:306-316.
[3] 许江波, 史国梁. 基于PSM模型的僵尸企业识别方法有效性检验[J]. 财会月刊, 2018(15):31-37. Xu J B, Shi G L. Validation of identification method of zombie enterprises based on PSM model[J]. Journal of Finance and Accounting, 2018(15):31-37. (in Chinese)
[4] 宁相波, 蓝梦. 财务独立董事能否抑制僵尸企业的形成?[J]. 商业会计, 2018(8):72-75. Ning X B, Lan M. Can independent financial directors inhibit the formation of zombie enterprises?[J]. Business Accounting, 2018(8):72-75. (in Chinese)
[5] 孔繁成. 僵尸企业现状、原因及对策研究——来自中国A股上市公司的经验证据[J]. 现代管理科学, 2019(7):60-62. Kong F C. Current situation, causes and countermeasures of zombie enterprises:empirical evidence from A share listed companies in China[J]. Modern Management Science, 2019(7):60-62. (in Chinese)
[6] 何帆, 朱鹤. 僵尸企业的识别与应对[J]. 中国金融, 2016(5):20-22. He F, Zhu H. Identification and response of zombie enterprises[J]. China Finance, 2016(5):20-22. (in Chinese)
[7] 朱鹤, 何帆. 中国僵尸企业的数量测度及特征分析[J]. 北京工商大学学报(社会科学版), 2016, 31(4):116-126. Zhu H, He F. Quantitative measurement and characteristic analysis of Chinese zombie enterprises[J]. Journal of Beijing University of Technology and Industry (Social Science Edition), 2016, 31(4):116-126. (in Chinese)
[8] Hosmer D W Jr, Stanley L, Rodney S X. Applied logistic regression[M].[S.l.]:John Wiley & Sons, 2013, 23(1):159-160.
[9] Safavian R S, Landgrebe D. A survey of decision tree classifier methodology[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1991, 21(3):660-674.
[10] Maniruzzaman M, Rahman M J, Al-Mehedi H M, et al. Accurate diabetes risk stratification using machine learning:role of missing value and outliers[J]. Journal of Medical Systems, 2018, 42(5):1-17.
[11] Chandrashekar G, Sahin F. A survey on feature selection methods[J]. Computers & Electrical Engineering, 2014, 40(1):16-28.
[12] Zainodin H J, Yap S J. Overcoming multicollinearity in multiple regression using correlation coefficient[C]//American Institute of Physics Conference Proceedings, 2013, 1557(1):416-419.
[13] Uyanik G K, Güler N. A study on multiple linear regression analysis[J]. Procedia-Social and Behavioral Sciences, 2013, 106:234-240.
[14] Sharpe D. Chi-square test is statistically significant:now what?[J]. Practical Assessment, Research, and Evaluation, 2015, 20(1):8.
[15] Li M, Zhang T, Chen Y, et al. Efficient mini-batch training for stochastic optimization[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014:661-670.
[16] Bottou L. Large-scale machine learning with stochastic gradient descent[C]//Proceedings of COMPSTAT 2010.[S.l.]:Physica-Verlag HD, 2010:177-186.
[17] Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.[S.l.]:PMLR, 2013:1139-1147.
[18] Zou F, Shen L, Jie Z, et al. A sufficient condition for convergences of Adam and RMSProp[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:11127-11135.
Outlines

/