应用科学学报 ›› 2025, Vol. 43 ›› Issue (4): 541-558.doi: 10.3969/j.issn.0255-8297.2025.04.001

• 区块链 • 上一篇    

基于机器学习的智能合约漏洞检测技术

刘丽丽, 时忆杰, 秦素娟   

  1. 北京邮电大学 网络空间安全学院, 北京 100876
  • 收稿日期:2025-01-02 发布日期:2025-07-31
  • 通信作者: 秦素娟,教授,博士生导师,研究方向为网络安全、区块链安全、密码理论。E-mail:qsujuan@bupt.edu.cn E-mail:qsujuan@bupt.edu.cn
  • 基金资助:
    国家重点研发计划(No.2021YFB2700400)

Smart Contract Vulnerability Detection Technology Based on Machine Learning

LIU Lili, SHI Yijie, QIN Sujuan   

  1. School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2025-01-02 Published:2025-07-31

摘要: 针对现有智能合约漏洞检测技术检测效率和自动化程度低、无法实现大规模智能合约样本检测的问题,提出了基于机器学习对智能合约进行漏洞检测的方法。该方法首先对智能合约数据集进行预处理,将智能合约Solidity源码转换为操作码序列,并制定操作码抽象简化规则对其进行约简。在此基础上,利用N-gram模型从简化后的操作码序列数据集中提取到2025维bigram特征,并分别采用嵌入法进行特征选择和主成分分析法进行特征降维,构建3种特征表示方式。然后使用Borderline SMOTE方法对正负样本不均衡数据集进行均衡处理,最后分别使用决策树、支持向量机、随机森林和XGBoost这4种算法构建漏洞检测模型。实验结果表明,随机森林的漏洞检测模型平均准确率达93.60%,总体性能Macro-F1达到93.91%,能够高效地实现多种漏洞的检测。

关键词: 区块链, 机器学习, 智能合约, 漏洞检测

Abstract: To address the limitations of the existing smart contract vulnerability detection technology, including low detection efficiency, inadequate automation, and the inability to realize large-scale smart contract sample detection, this study proposed a method for smart contract vulnerability detection technology based on machine learning. The method first preprocessed the smart contract dataset, converted the source code of the smart contract into a sequence of opcodes, and formulated the opcode abstraction simplification rules for simplification. On this basis, 2025-dimensional bigram features were extracted from the simplified opcode sequence dataset using the N-gram model, and three feature representations were constructed by using the embedding method for feature selection and principal component analysis for feature dimensionality reduction, respectively. Then, the Borderline SMOTE method, an improved algorithm of SMOTE, was used to equalize the positive and negative sample imbalance dataset. Finally, four algorithms, namely, decision tree, support vector machine, random forest, and XGBoost, were applied to construct the vulnerability detection model, respectively. The experimental results show that the vulnerability detection model of random forest has an average accuracy of 93.60%, and the overall performance Macro-F1 reaches 93.91%, which can efficiently detect multiple vulnerabilities.

Key words: blockchain, machine learning, smart contract, vulnerability detection

中图分类号: