应用科学学报 ›› 2025, Vol. 43 ›› Issue (2): 334-347.doi: 10.3969/j.issn.0255-8297.2025.02.011

• 计算机科学与应用 • 上一篇    

基于并行解码和聚类的课程实体关系联合抽取

孙丽郡, 徐行健, 孟繁军   

  1. 内蒙古师范大学 计算机科学技术学院, 内蒙古自治区 呼和浩特 010022
  • 收稿日期:2024-07-10 发布日期:2025-04-03
  • 通信作者: 孟繁军,教授,研究方向为教育大数据分析、网络存储系统等。E-mail:ciecmfj@imnu.edu.cn
  • 基金资助:
    内蒙古自治区自然科学基金(No.2023LHMSS06011,No.2023MS06016);内蒙古师范大学大学生创新创业训练计划(No.202310153007)资助

Joint Extraction of Curriculum Entity Relationships Based on Parallel Decoding and Clustering

SUN Lijun, XU Xingjian, MENG Fanjun   

  1. College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, Inner Mongolia, China
  • Received:2024-07-10 Published:2025-04-03

摘要: 实体关系联合抽取作为构建知识图谱的核心环节,旨在从非结构化文本中提取实体-关系三元组。针对现有联合抽取方法在解码时未能有效处理实体关系间的相互作用,导致对语境理解不足,产生冗余信息等问题,提出一种基于并行解码和聚类的实体关系联合抽取模型。首先,利用BERT(bidirectional encoder representations from transformers)模型进行文本编码,获取语义信息丰富的字符向量。其次,采用非自回归并行解码器增强实体关系间的交互,并引入层次凝聚聚类算法及多数投票机制进一步优化解码结果以捕获语境信息,减少冗余信息。最后,生成高质量的三元组集合,以构建课程知识图谱。为评估该方法的性能,在公共数据集NYT和WebNLG以及自建C语言数据集上进行实验,结果表明,该方法在精确率和F1值上优于其他对比模型。

关键词: 联合抽取, 并行解码, 层次凝聚聚类, 多数投票机制, 课程知识图谱

Abstract: Entity-relation joint extraction, as a core part of knowledge graph construction, aims to extract entity-relation triples from unstructured text. Current joint extraction methods often struggle with decoding inefficiencies, resulting in weak interaction modeling between entities and relations, insufficient context understanding, and redundant information. To address these limitations, we propose a model based on parallel decoding and clustering for entity-relation joint extraction. First, the bidirectional encoder representations from transformers (BERT) model is used for text encoding to obtain character vectors rich in semantic information. Next, a non-autoregressive parallel decoder is employed to enhance interactions between entities and relations. To further optimize decoding results, hierarchical agglomerative clustering is combined with a majority voting mechanism, improving contextual information capture and reducing redundancy. Finally, a high-quality set of triples is generated to construct a curriculum knowledge graph. To evaluate the performance of the proposed method, experiments are conducted on the public datasets NYT and WebNLG, as well as a self-constructed C language dataset. The results show that this method outperforms other models in terms of precision and F1 score.

Key words: joint extraction, parallel decoding, hierarchical agglomerative clustering, majority voting mechanism, curriculum knowledge graph

中图分类号: