面向威胁情报分析的恶意软件知识图谱构建

doi:10.3969/j.issn.0255-8297.2026.01.005

摘要/Abstract

摘要： 威胁情报分析是提高主动防御能力的重要手段，研究恶意软件知识图谱构建对提高恶意软件的检测能力具有重要意义。在恶意软件知识图谱构建中，实体和关系抽取的准确性和完整性有待进一步提高。本文提出一种基于联合抽取模型的恶意软件知识图谱构建方法。首先，提出了一种面向威胁情报分析的恶意软件本体模型，并定义了12种关系类型，以规范表达恶意软件关键知识。然后，提出了一种基于RoBERTa-Wwm和指针标注的联合抽取模型，以抽取恶意软件实体和关系，从而实现图谱的构建。实验表明，该联合抽取模型的F1值最高可达0.841。本文研究对恶意软件威胁情报的自动分析具有重要意义，也为提高主动防御能力奠定了基础。

关键词: 恶意软件, 知识图谱, 威胁情报, 实体抽取, 关系抽取

Abstract: Threat intelligence analysis is a crucial means to enhance proactive defense capabilities. Research on the construction of malware knowledge graphs holds significant importance for improving malware detection capabilities. In the construction of malware knowledge graphs, the accuracy and completeness of entity and relation extraction still require further improvement. This paper proposed a method for constructing malware knowledge graphs based on a joint extraction model. Firstly, a malware ontology model was proposed for threat intelligence analysis, defining 12 types of relations to standardize the expression of key knowledge about malware. Then, a joint extraction model based on RoBERTa with whole word masking (RoBERTa-Wwm) and pointer annotation was proposed to extract malware entities and their relations, thereby constructing a graph. The experiment demonstrates that the model achieves good performance with an F1 value of up to 0.841. This study is of great significance for the automatic analysis of malware threat intelligence, laying the foundation for improving proactive defense capabilities.

Key words: malware, knowledge graph, threat intelligence, entity extraction, relation extraction

中图分类号:

TP391

向尕, 胡演, 张仰森, 孙露, 齐睿, 谭自程. 面向威胁情报分析的恶意软件知识图谱构建[J]. 应用科学学报, 2026, 44(1): 67-82.

XIANG Ga, HU Yan, ZHANG Yangsen, SUN Lu, QI Rui, TAN Zicheng. Construction of Malware Knowledge Graph for Threat Intelligence Analysis[J]. Journal of Applied Sciences, 2026, 44(1): 67-82.

参考文献

[1] 董聪, 姜波, 卢志刚, 等. 面向网络空间安全情报的知识图谱综述[J]. 信息安全学报, 2020, 5(5): 56-76. Dong C, Jiang B, Lu Z G, et al. Knowledge graph for cyberspace security intelligence: a survey [J]. Journal of Cyber Security, 2020, 5(5): 56-76. (in Chinese)
[2] 朱朝阳, 周亮, 朱亚运, 等. 基于行为图谱筛的恶意代码可视化分类算法[J]. 信息网络安全, 2021, 21(10): 54-62. Zhu Z Y, Zhou L, Zhu Y Y, et al. Malicious code visual classification algorithm based on behavior knowledge graph sieve [J]. Netinfo Security, 2021, 21(10): 54-62. (in Chinese)
[3] 刘峤, 李杨, 段宏, 等. 知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3): 582-600. Liu Q, Li Y, Duan H, et al. Knowledge graph construction techniques [J]. Journal of Computer Research and Development, 2016, 53(3): 582-600. (in Chinese)
[4] Grégio A, Bonacin R, De Marchi A C, et al. An ontology of suspicious software behavior [J]. Applied Ontology, 2016, 11(1): 29-49.
[5] Ding Y X, Wu R, Zhang X. Ontology-based knowledge representation for malware individuals and families [J]. Computers & Security, 2019, 87: 101574.
[6] 唐成华, 侯梦迪, 高庆泽, 等. 多类软件本体的恶意软件语义描述模型[J]. 小型微型计算机系统, 2021, 42(11): 2433-2439. Tang C H, Hou M D, Gao Q Z, et al. Malware semantic description model based on multi class software ontology [J]. Journal of Chinese Computer Systems, 2021, 42(11): 2433-2439. (in Chinese)
[7] 章瑞康, 周娟, 袁军, 等. SimMal: 基于异构图学习的恶意软件关联分析系统[J]. 网络与信息安全, 2021, 40(11): 8-15. Zhang R K, Zhou J, Yuan J, et al. SimMal: heterogeneous graph learning-based malware association analysis system [J]. Network and Information Security, 2021, 40(11): 8-15(in Chinese)
[8] 陈剑锋, 范航博. 面向网络空间安全的威胁情报本体化共享研究[J]. 通信技术, 2018, 51(1): 171- 177. Chen J F, Fan H B. Ontological threat intelligence sharing in cyberspace security [J]. Communications Technology, 2018, 51(1): 171-177. (in Chinese)
[9] 杨秀璋, 彭国军, 李子川, 等. 基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究[J]. 通信学报, 2022, 43(6): 58-70. Yang X Z, Peng G J, Li Z C, et al. Research on entity recognition and alignment of APT attack based on Bert and BiLSTM-CRF [J]. Journal on Communications, 2022, 43(6): 58-70. (in Chinese)
[10] Li Q, Ji H. Incremental joint extraction of entity mentions and relations [C]//52nd Annual Meeting of the Association for Computational Linguistics, 2014: 402-412.
[11] 张少伟, 王鑫, 陈子睿, 等. 有监督实体关系联合抽取方法研究综述[J]. 计算机科学与探索, 2022, 16(4): 713-733. Zhang S W, Wang X, Chen Z R, et al. Survey of supervised joint entity relation extraction methods [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 713-733. (in Chinese)
[12] Yu X F, Lam W. Jointly identifying entities and extracting relations in encyclopedia text via a graphical model approach [C]//23rd International Conference on Computational Linguistics (Coling 2010), 2010: 1399-1407.
[13] Ahmed K, Khurshid S K, Hina S. CyberEntRel: joint extraction of cyber entities and relations using deep learning [J]. Computers & Security, 2024, 136: 103579.
[14] Shang W L, Wang B W, Zhu P C, et al. A span-based multivariate information-aware embedding network for joint relational triplet extraction of threat intelligence [J]. KnowledgeBased Systems, 2024, 295: 111829.
[15] 程顺航, 李志华, 魏涛. 融合自举与语义角色标注的威胁情报实体关系抽取方法[J]. 计算机应用, 2023, 43(5): 1445-1453. Cheng S H, Li Z H, Wei T. Threat intelligence entity relation extraction method integrating bootstrapping and semantic role labeling [J]. Journal of Computer Applications, 2023, 43(5): 1445-1453. (in Chinese)
[16] Barron R, Eren M E, Bhattarai M, et al. Cyber-security knowledge graph generation by hierarchical nonnegative matrix factorization [DB/OL]. (2024-03-26) [2025-08-07]. https://arxiv.org/abs/2403.16222v2.
[17] 李涛, 郭渊博, 琚安康. 融合对抗主动学习的网络安全知识三元组抽取[J]. 通信学报, 2020, 41(10): 80-91. Li T, Guo Y B, Ju A K. Knowledge triple extraction in cybersecurity with adversarial active learning [J]. Journal on Communications, 2020, 41(10): 80-91. (in Chinese)
[18] Sarhan I, Spruit M. Open-CyKG: an open cyber threat intelligence knowledge graph [J]. Knowledge-Based Systems, 2021, 233: 107524.
[19] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[20] 刘欢, 张智雄, 王宇飞. BERT模型的主要优化改进方法研究综述[J]. 数据分析与知识发现, 2021, 5(1): 3-15. Liu H, Zhang Z X, Wang Y F. A review on main optimization methods of BERT [J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 3-15. (in Chinese)
[21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [DB/OL]. (2023-08-02) [2025-08-07]. https://arxiv.org/abs/1706.03762.
[22] Cui Y M, Che W X, Liu T, et al. Pre-training with whole word masking for Chinese BERT [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[23] 王子牛, 姜猛, 高建瓴, 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019, 46(11A): 138-142. Wang Z N, Jiang M, Gao J L, et al. Chinese named entity recognition method based on BERT [J]. Computer Science, 2019, 46(11A): 138-142. (in Chinese)
[24] 谢腾, 杨俊安, 刘辉. 基于BERT-BiLSTM-CRF模型的中文实体识别[J]. 计算机系统应用, 2020, 29(7): 48-55. Xie T, Yang J A, Liu H. Chinese entity recognition based on BERT-BiLSTM-CRF model [J]. Computer Systems & Applications, 2020, 29(7): 48-55. (in Chinese)
[25] Bekoulis G, Deleu J, Demeester T, et al. Joint entity recognition and relation extraction as a multi-head selection problem [J]. Expert Systems with Applications, 2018, 114: 34-45.
[26] Guo Y Y, Liu Z Y, Huang C, et al. A framework for threat intelligence extraction and fusion [J]. Computers & Security, 2023, 132: 103371.
[27] Zuo J, Gao Y, Li X, et al. An end-to-end entity and relation joint extraction model for cyber threat intelligence [C]//7th International Conference on Big Data Analytics (ICBDA), 2022: 204-209.