To address the limitations of the existing smart contract vulnerability detection technology, including low detection efficiency, inadequate automation, and the inability to realize large-scale smart contract sample detection, this study proposed a method for smart contract vulnerability detection technology based on machine learning. The method first preprocessed the smart contract dataset, converted the source code of the smart contract into a sequence of opcodes, and formulated the opcode abstraction simplification rules for simplification. On this basis, 2025-dimensional bigram features were extracted from the simplified opcode sequence dataset using the N-gram model, and three feature representations were constructed by using the embedding method for feature selection and principal component analysis for feature dimensionality reduction, respectively. Then, the Borderline SMOTE method, an improved algorithm of SMOTE, was used to equalize the positive and negative sample imbalance dataset. Finally, four algorithms, namely, decision tree, support vector machine, random forest, and XGBoost, were applied to construct the vulnerability detection model, respectively. The experimental results show that the vulnerability detection model of random forest has an average accuracy of 93.60%, and the overall performance Macro-F1 reaches 93.91%, which can efficiently detect multiple vulnerabilities.
LIU Lili
,
SHI Yijie
,
QIN Sujuan
. Smart Contract Vulnerability Detection Technology Based on Machine Learning[J]. Journal of Applied Sciences, 2025
, 43(4)
: 541
-558
.
DOI: 10.3969/j.issn.0255-8297.2025.04.001
[1] Szabo N. Formalizing and securing relationships on public networks [J]. First Monday, 1997(9): 1-21.
[2] Mehar M I, Shier C L, Giambattista A, et al. Understanding a revolutionary and flawed grand experiment in blockchain: the DAO attack [J]. Journal of Cases on Information Technology, 2019(1): 19-32.
[3] Palladino S. The parity wallet hack explained [EB/OL]. (2017-07-19) [2025-01-02]. https://blog.openzeppelin.com/on-the-parity-wallet-multisig-hack-405a8c12e8f7.
[4] 张登记, 赵相福, 陈中育, 等. 基于Ethereum智能合约的安全策略分析[J]. 应用科学学报, 2021, 39(1): 151-163. Zhang D J, Zhao X F, Chen Z Y, et al. Analysis of security strategies for smart contracts based on Ethereum [J]. Journal of Applied Sciences, 2021, 39(1): 151-163. (in Chinese)
[5] 古天龙, 蔡国永. 网络协议的形式化分析与设计[M]. 北京: 电子工业出版社, 2003.
[6] Hirai, Y. Defining the Ethereum virtual machine for interactive theorem provers [C]//International Conference on Financial Cryptography & Data Security. Springer, Cham, 2017.
[7] Hildenbrandt E, Saxena M, Rodrigues N, et al. KEVM: a complete formal semantics of the Ethereum virtual machine [C]//2018 IEEE 31st Computer Security Foundations Symposium (CSF), 2018: 204-217.
[8] Luu L, Chu D H, Olickel H, et al. Making smart contracts smarter [C]//2016 ACM SIGSAC Conference on Computer and Communications Security, 2016: 254-269.
[9] Chen T, Li X Q, Luo X P, et al. Under-optimized smart contracts devour your money [C]//24th IEEE International Conference on Software Analysis, Evolution and Reengineering, 2017: 442-446.
[10] Nikolic I, Kolluri A, Sergey I, et al. Finding the greedy, prodigal, and suicidal contracts at scale [C]//34th Annual Computer Security Applications Conference, 2018: 653-663.
[11] Mueller B, Honig J, Parasaram N, et al. ConsenSys/mythril [EB/OL]. (2024-03-28) [2025- 01-02]. https://github.com/ConsenSys/mythril.
[12] Liu H, Liu C, Zhao W, et al. S-gram: towards semantic-aware security auditing for Ethereum smart contracts [C]//33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018: 814-819.
[13] Liao J W, Tsai T T, He C K, et al. Soliaudit: smart contract vulnerability assessment based on machine learning and fuzz testing [C]//2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), 2019: 458-465.
[14] Wang W, Song J, Xu G Q, et al. Contractward: automated vulnerability detection models for ethereum smart contracts [J]. IEEE Transactions on Network Science and Engineering, 2021, 8(2): 1133-1144.
[15] Eshghie M, Artho C, Gurov D. Dynamic vulnerability detection on smart contracts using machine learning [C]//25th International Conference on Evaluation and Assessment in Software Engineering, 2021: 305-312.
[16] Xue Y, Ye J, Zhang W, et al. xFuzz: machine learning guided cross-contract fuzzing [DB/OL]. (2022-06-30) [2025-07-20]. https://arxiv.org/pdf/2111.12423v2.
[17] He J, Balunović M, Ambroladze N, et al. Learning to fuzz from symbolic execution with application to smart contracts [C]//2019 ACM SIGSAC Conference on Computer and Communications Security, 2019: 531-548.
[18] Durieux T, Ferreira J F, Abreu R, et al. Empirical review of automated analysis tools on 47, 587 Ethereum smart contracts [C]//ACM/IEEE 42nd International Conference on Software Engineering (ICSE), 2020: 530-541.
[19] Hassan N, Gomaa W, Khoriba G, et al. Credibility detection in twitter using word N-gram analysis and supervised machine learning techniques [J]. International Journal of Intelligent Engineering and Systems, 2020(1): 291-300.
[20] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique [J]. Journal of Artificial Intelligence Research, 2002(1): 321-357.
[21] Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [C]//2005 International Conference on Intelligent Computing, 2005: 878-887.
[22] 刘峰. 基于多目标优化的多标签分类算法参数调谐研究[D]. 南京: 南京师范大学, 2014.
[23] Peng M, Wu Z, Zhang Z, et al. From macro to micro expression recognition: deep learning on small datasets using transfer learning [C]//201813th IEEE International Conference on Automatic Face & Gesture Recognition, 2018: 657-661.