Special Issue on Computer Application

Entity Relationship Extraction Framework Based on Pre-trained Large Language Model and Its Application

Expand
  • 1. School of Management, Zhengzhou University, Zhengzhou 450001, Henan, China;
    2. User Intelligent Operation Department, JD Retail, Beijing 100176, China;
    3. Business School, Zhengzhou University, Zhengzhou 450001, Henan, China

Received date: 2024-07-09

  Online published: 2025-01-24

Abstract

Entity relationship extraction is a crucial foundation for building large-scale knowledge graphs and domain-specific datasets. This paper proposes an entity relationship extraction framework based on pre-trained large language models (PLLM-RE) for relation extraction in circular economy policies. Within this framework, entity recognition of circular economy policy texts is performed based on the model RoBERTa. Subsequently, the bidirectional encoder representation from Transformers (BERT) is employed for entity relation extraction, facilitating the construction of a knowledge graph in the field of circular economic policies. Experimental results demonstrate the framework outperforms the baseline models including BiLSTM-ATT, PCNN, BERT and ALBERT in task of entity relationship extraction for circular economy policies. These findings validate the adaptability and superiority of the proposed framework, providing new ideas for information mining and policy analysis in the field of circular economy resources in the future.

Cite this article

WEI Wei, JIN Chenggong, YANG Long, ZHOU Mo, MENG Xiangzhu, FENG Hui . Entity Relationship Extraction Framework Based on Pre-trained Large Language Model and Its Application[J]. Journal of Applied Sciences, 2025 , 43(1) : 20 -34 . DOI: 10.3969/j.issn.0255-8297.2025.01.002

References

[1] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C]//31st Annual Conference on Neural Information Processing Systems, 2017: 5998-6008.
[2] Devlin J, Chang M, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]//Conference of the North-American-Chapter of the Association-forComputational-Linguistics-Human Language Technologies, 2019: 4171-4186.
[3] Zhao W X, Zhou K, Li J Y, et al. A survey of large language models [DB/OL]. 2024[2024- 07-09]. https://arxiv.org/abs/2303.18223.
[4] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining [J]. Bioinformatics, 2020, 36(4): 1234-1240.
[5] 景慎旗, 赵又霖. 基于医学领域知识和远程监督的医学实体关系抽取研究[J]. 数据分析与知识发现, 2022, 6(6): 105-114. Jing S Q, Zhao Y L. Extracting medical entity relationships with domain-specific knowledge and distant supervision [J]. Data Analysis and Knowledge Discovery, 2022, 6(6): 105-114. (in Chinese)
[6] 王欢, 王兴芬, 吕金娜. 面向金融文本的实体关系抽取方法[J]. 计算机工程与设计, 2023, 44(11): 3345-3351. Wang H, Wang X F, Lyu J N. Entity relation extraction method for financial text [J]. Computer Engineering and Design, 2023, 44(11): 3345-3351. (in Chinese)
[7] 唐晓波, 刘志源. 金融领域文本序列标注与实体关系联合抽取研究[J]. 情报科学, 2021, 39(5): 3-11. Tang X B, Liu Z Y. Research on text sequence tagging and joint extraction of entity and relation in financial field [J]. Information Science, 2021, 39(5): 3-11. (in Chinese)
[8] 高丹, 彭敦陆, 刘丛. 海量法律文书中基于CNN的实体关系抽取技术[J]. 小型微型计算机系统, 2018, 39(5): 1021-1026. Gao D, Peng D L, Liu C. Entity relation extraction based on CNN in large-scale text data [J]. Journal of Chinese Computer Systems, 2018, 39(5): 1021-1026. (in Chinese)
[9] 陈彦光, 王雷, 孙媛媛, 等. 面向法律文本的三元组抽取模型[J]. 计算机工程, 2021, 47(5): 277-284. Chen Y G, Wang L, Sun Y Y, et al. Triple extraction model for legal texts [J]. Computer Engineering, 2021, 47(5): 277-284. (in Chinese)
[10] Veena G, Gupta D, Kanjirangat V. Semi-supervised bootstrapped syntax-semantics-based approach for agriculture relation extraction for knowledge graph creation and reasoning [J]. IEEE Access, 2023, 11: 138375-138398.
[11] 李书琴, 庞文婷. 词嵌入BERT-CRF玉米育种实体关系联合抽取方法[J]. 农业机械学报, 2023, 54(11): 286-294. Li S Q, Pang W T. Joint extraction method of entity and relation in maize breeding based on BERT-CRF and word embedding [J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(11): 286-294. (in Chinese)
[12] 黄徐胜, 朱月琴, 付立军, 等. 基于BERT的金矿地质实体关系抽取模型研究[J]. 地质力学学报, 2021, 27(3): 391-399. Huang X S, Zhu Y Q, Fu L J, et al. Research on a geological entity relation extraction model for gold mine based on BERT [J]. Journal of Geomechanics, 2021, 27(3): 391-399. (in Chinese)
[13] 邱芹军, 王斌, 徐德馨, 等. 地质领域文本实体关系联合抽取方法[J]. 高校地质学报, 2023, 29(3): 419-428. Qiu Q J, Wang B, Xu D X, et al. Research on the joint extraction method of entity relations in geological domain [J]. Geological Journal of China Universities, 2023, 29(3): 419-428. (in Chinese)
[14] 曹碧薇, 曹玖新, 桂杰, 等. 面向中文文学作品的长文本人物关系抽取[J]. 中文信息学报, 2023, 37(5): 88-100. Cao B W, Cao J X, Gui J, et al. Character relation extraction from Chinese literature [J]. Journal of Chinese Information Processing, 2023, 37(5): 88-100. (in Chinese)
[15] 魏静, 岳昆, 段亮, 等. 基于指代消解的民间文学文本实体关系抽取[J]. 河南师范大学学报(自然科学版), 2024, 52(1): 84-92. Wei J, Yue K, Duan L, et al. Coreference resolution for relation extraction in folk literature [J]. Journal of Henan Normal University (Natural Science Edition), 2024, 52(1): 84-92. (in Chinese)
[16] 杨文霞, 王卫华, 何朗, 等. 知识图谱赋能智慧教育的研究与实践——以武汉理工大学“线性代数” 课程为例[J]. 高等工程教育研究, 2023(6): 111-117. Yang W X, Wang W H, He L, et al. Research and practice of empowering smart education with knowledge graph—a case study of “linear algebra” at Wuhan University of Technology [J]. Research in Higher Education of Engineering, 2023(6): 111-117. (in Chinese)
[17] 赵宇博, 张丽萍, 闫盛, 等. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 计算机应用, 2024, 44(8): 2421-2429. Zhao Y B, Zhang L P, Yan S, et al. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429. (in Chinese)
[18] Zhou G D, Su J, Zhang J, et al. Exploring various knowledge in relation extraction [C]//43rd Annual Meeting of the Association for Computational Linguistic, 2005: 427-434.
[19] 甘丽新, 万常选, 刘德喜, 等. 基于句法语义特征的中文实体关系抽取[J]. 计算机研究与发展, 2016, 53(2): 284-302. Gan L X, Wan C X, Liu D X, et al. Chinese named entity relation extraction based on syntactic and semantic features [J]. Journal of Computer Research and Development, 2016, 53(2): 284-302. (in Chinese)
[20] 黄瑞红, 孙乐, 冯元勇, 等. 基于核方法的中文实体关系抽取研究[J]. 中文信息学报, 2008, 22(5): 102-108. Huang R H, Sun L, Feng Y Y, et al. A study on kernel-based Chinese relation extraction [J] Journal of Chinese Information Processing, 2008, 22(5): 102-108. (in Chinese)
[21] 刘克彬, 李芳, 刘磊, 等. 基于核函数中文关系自动抽取系统的实现[J]. 计算机研究与发展, 2007, 44(8): 1406-1411. Liu K B, Li F, Liu L, et al. Implementation of a kernel-based Chinese relation extraction system [J]. Journal of Computer Research and Development, 2007, 44(8): 1406-1411. (in Chinese)
[22] 陈鹏, 郭剑毅, 余正涛, 等. 基于凸组合核函数的中文领域实体关系抽取[J]. 中文信息学报, 2013, 27(5): 144-148, 155. Chen P, Guo J Y, Yu Z T, et al. Chinese field entity relation extraction based on convex combination kernel function [J]. Journal of Chinese Information Processing, 2013, 27(5): 144- 148, 155. (in Chinese)
[23] Socher R, Huval B, Manning C D, et al. Semantic compositionality through recursive matrix-vector spaces [C]//Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012: 1201-1211.
[24] Zeng D J, Liu K, Lai S W, et al. Relation classification via convolutional deep neural network [C]//25th International Conference on Computational Linguistics, 2014: 2335-2344.
[25] Zhang S, Zheng D Q, Hu X C, et al. Bidirectional long short-term memory networks for relation classification [C]//29th Pacific Asia Conference on Language, Information and Computation, 2015: 73-78.
[26] Lin Y K, Shen S Q, Liu Z Y, et al. Neural relation extraction with selective attention over instances [C]//54th Annual Meeting of the Association for Computational Linguistics, 2016: 2124-2133.
[27] Katiyar A, Cardie C. Going out on a limb: joint extraction of entity mentions and relations without dependency trees [C]//55th Annual Meeting of the Association for Computational Linguistic, 2017: 917-928.
[28] Lan Z, Chen M, Goodman S, et al. ALBERT: a lite BERT for self-supervised learning of language representations [DB/OL]. 2019[2024-07-09]. http://arxiv.org/abs/1909.11942.
[29] Liu Y, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach [DB/OL]. 2019[2024-07-09]. https://arxiv.org/abs/1907.11692.
[30] Wu S, He Y. Enriching pre-trained language model with entity information for relation classification [C]//28th ACM International Conference on Information and Knowledge Management, 2019: 2361-2364.
[31] Tom B, Benjamin M, Nick R, et al. Language models are few-shot learners [C]//Annual Conference on Neural Information Processing Systems, 2020: 6-12.
[32] Ross T, Marcin K, Guillem C, et al. Galactica: a large language model for science [DB/OL]. 2022[2024-07-09]. https://arxiv.org/abs/2211.09085.
[33] Chowdhery A, Narang S R, Devlin J, et al. PaLM: scaling language modeling with pathways [DB/OL]. 2022[2024-07-09]. https://arxiv.org/abs/2204.02311.
[34] Hugo T, Thibaut L, Gautier I, et al. LLaMA: open and efficient foundation language models [DB/OL]. 2023[2024-07-09]. https://arxiv.org/abs/2302.13971.
[35] Yu B H, Zhang X X. Research and application of semi-supervised entity recognition method in the field of technology policy [C]//11th International Conference of Information and Communication Technology, 2022: 436-440.
[36] Qi W S, Xu Q, Ding H. Named entity recognition of benefit enterprise policy based on RoBERTa_wwm_ext_BiLSTM-CRF [C]//International Conference on Algorithms, Data Mining, and Information Technology, 2022: 140-146.
[37] 喻金平, 朱伟锋, 廖列法. 基于RoBERTa-wwm-BiLSTM-CRF的扶持政策文本实体识别研究[J]. 计算机工程与科学, 2023, 45(8): 1498-1507. Yu J P, Zhu W F, Liao L F. Entity recognition of support policy text based on RoBERTawwm-BiLSTM-CRF [J]. Computer Engineering & Science, 2023, 45(8): 1498-1507. (in Chinese)
[38] 刘明辉, 唐望径, 许斌, 等. 实体类别信息增强的命名实体识别算法[J]. 应用科学学报, 2023, 41(1): 1-9. Liu M H, Tang W J, Xu B, et al. Named entity recognition algorithm enhanced with entity category information [J]. Journal of Applied Sciences, 2023, 41(1): 1-9. (in Chinese)
[39] 蒋翔, 马建霞, 袁慧. 基于BiLSTM-IDCNN-CRF模型的生态治理技术领域命名实体识别[J]. 计算机应用与软件, 2021, 38(3): 134-141. Jiang X, Ma J X, Yuan H. Named entity recognition in the field of ecological management technology based on BiLSTM-IDCNN-CRF model [J]. Computer Applications and Software, 2021, 38(3): 134-141. (in Chinese)
[40] 孙甜, 陈海涛, 吕学强, 等. 新能源专利文本术语抽取研究[J]. 小型微型计算机系统, 2022, 43(5): 950-956. Sun T, Chen H T, Lyu X Q, et al. Research on term extraction of new energy patent text [J]. Journal of Chinese Computer Systems, 2022, 43(5): 950-956. (in Chinese)
Outlines

/