应用科学学报 ›› 2023, Vol. 41 ›› Issue (1): 95-106.doi: 10.3969/j.issn.0255-8297.2023.01.008

• 计算机应用专辑 • 上一篇    下一篇

基于跨度回归的中文事件触发词抽取

赵宇豪1,2, 陈艳平1,2, 黄瑞章1,2, 秦永彬1,2   

  1. 1. 贵州大学 公共大数据国家重点实验室, 贵州 贵阳 550025;
    2. 贵州大学 计算机科学与技术学院, 贵州 贵阳 550025
  • 收稿日期:2022-06-20 出版日期:2023-01-31 发布日期:2023-02-03
  • 通信作者: 陈艳平,教授,研究方向为人工智能、自然语言处理。E-mail:ypench@gmail.com E-mail:ypench@gmail.com
  • 基金资助:
    国家自然科学基金(No.62166007)资助

Chinese Event Trigger Extraction Based on Span Regression

ZHAO Yuhao1,2, CHEN Yanping1,2, HUANG Ruizhang1,2, QING Yongbin1,2   

  1. 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, Guizhou, China;
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2022-06-20 Online:2023-01-31 Published:2023-02-03

摘要: 在中文事件触发词抽取任务中,基于词的模型会受到分词带来的错误,而基于字符的模型则难以捕获触发词的结构信息和上下文语义信息,为此提出了一种基于跨度回归的触发词抽取方法。该方法考虑到句子中特定长度的字符子序列(跨度)可能构成一个事件触发词,用基于Transformer的双向编码器的预训练语言模型获取句子的特征表示,进而生成触发词候选跨度;然后用一个分类器过滤低置信度的候选跨度,通过回归调整候选跨度的边界来准确定位触发词;最后对调整后的候选跨度进行分类得到抽取结果。在ACE2005中文数据集上的实验结果表明:基于跨度回归的方法对触发词识别任务的F1值为73.20%,对触发词分类任务的F1值为71.60%,优于现有模型;并与仅基于跨度的方法进行对比,验证了对跨度边界进行回归调整可以提高事件触发词检测的准确性。

关键词: 事件抽取, 事件触发词, 基于Transformer的双向编码器, 特征表示, 跨度表示, 回归调整

Abstract: In Chinese event trigger word extraction tasks, word-based models suffer from errors caused by word separation, while character-based models have difficulty in capturing the structural and contextual semantic information of trigger words. In view of the problem, a spanwise regression-based trigger word extraction method is proposed. Considering that a specific length of character subsequence (span) in a sentence may constitute an event trigger word, the method obtains the feature representation of the sentence with a pre-trained model of bidirectional encoder representation from Transformer (BERT), and generates the candidate span of the trigger word on the sentence feature representation. Then the model filters the candidate span with low confidence using a classifier, and adjusts the boundaries of the candidate span by regression to accurately locate the trigger word. Finally, the adjusted candidate spans are classified, and extraction results are obtained. Experimental results on the ACE2005 Chinese dataset show that the F1 value of the span-based regression method is 73.20% for trigger word recognition task and 71.60% for trigger word classification task, better than existing models. Also, experimental comparison with span-based method without regression verifies that the regression adjustment of span boundaries can improve the accuracy of event trigger word detection.

Key words: event extraction, event trigger word, bidirectional encoder representation from Transformer (BERT), feature representation, span representation, regression adjustment

中图分类号: