应用科学学报 ›› 2026, Vol. 44 ›› Issue (1): 67-82.doi: 10.3969/j.issn.0255-8297.2026.01.005

• 计算机应用专辑 • 上一篇    下一篇

面向威胁情报分析的恶意软件知识图谱构建

向尕1,2, 胡演1, 张仰森1,2, 孙露1, 齐睿1, 谭自程1   

  1. 1. 北京信息科技大学 计算机学院, 北京 102206;
    2. 北京信息科技大学 智能信息处理研究所, 北京 102206
  • 收稿日期:2025-08-07 发布日期:2026-02-03
  • 通信作者: 向尕,副教授,研究方向为信息安全、自然语言处理、人工智能。E-mail:xiangga@bistu.edu.cn E-mail:xiangga@bistu.edu.cn
  • 基金资助:
    北京市自然科学基金-小米创新联合基金(No.L233008);北京市教育委员会科研计划项目(No.KM202311232014)

Construction of Malware Knowledge Graph for Threat Intelligence Analysis

XIANG Ga1,2, HU Yan1, ZHANG Yangsen1,2, SUN Lu1, QI Rui1, TAN Zicheng1   

  1. 1. College of Computer Science, Beijing Information Science & Technology University, Beijing 102206, China;
    2. Institute of Intelligent Information Processing, Beijing Information Science & Technology University, Beijing 102206, China
  • Received:2025-08-07 Published:2026-02-03

摘要: 威胁情报分析是提高主动防御能力的重要手段,研究恶意软件知识图谱构建对提高恶意软件的检测能力具有重要意义。在恶意软件知识图谱构建中,实体和关系抽取的准确性和完整性有待进一步提高。本文提出一种基于联合抽取模型的恶意软件知识图谱构建方法。首先,提出了一种面向威胁情报分析的恶意软件本体模型,并定义了12种关系类型,以规范表达恶意软件关键知识。然后,提出了一种基于RoBERTa-Wwm和指针标注的联合抽取模型,以抽取恶意软件实体和关系,从而实现图谱的构建。实验表明,该联合抽取模型的F1值最高可达0.841。本文研究对恶意软件威胁情报的自动分析具有重要意义,也为提高主动防御能力奠定了基础。

关键词: 恶意软件, 知识图谱, 威胁情报, 实体抽取, 关系抽取

Abstract: Threat intelligence analysis is a crucial means to enhance proactive defense capabilities. Research on the construction of malware knowledge graphs holds significant importance for improving malware detection capabilities. In the construction of malware knowledge graphs, the accuracy and completeness of entity and relation extraction still require further improvement. This paper proposed a method for constructing malware knowledge graphs based on a joint extraction model. Firstly, a malware ontology model was proposed for threat intelligence analysis, defining 12 types of relations to standardize the expression of key knowledge about malware. Then, a joint extraction model based on RoBERTa with whole word masking (RoBERTa-Wwm) and pointer annotation was proposed to extract malware entities and their relations, thereby constructing a graph. The experiment demonstrates that the model achieves good performance with an F1 value of up to 0.841. This study is of great significance for the automatic analysis of malware threat intelligence, laying the foundation for improving proactive defense capabilities.

Key words: malware, knowledge graph, threat intelligence, entity extraction, relation extraction

中图分类号: