Journal of Applied Sciences ›› 2023, Vol. 41 ›› Issue (1): 95-106.doi: 10.3969/j.issn.0255-8297.2023.01.008

• Special Issue on Computer Applications • Previous Articles     Next Articles

Chinese Event Trigger Extraction Based on Span Regression

ZHAO Yuhao1,2, CHEN Yanping1,2, HUANG Ruizhang1,2, QING Yongbin1,2   

  1. 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, Guizhou, China;
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2022-06-20 Online:2023-01-31 Published:2023-02-03

Abstract: In Chinese event trigger word extraction tasks, word-based models suffer from errors caused by word separation, while character-based models have difficulty in capturing the structural and contextual semantic information of trigger words. In view of the problem, a spanwise regression-based trigger word extraction method is proposed. Considering that a specific length of character subsequence (span) in a sentence may constitute an event trigger word, the method obtains the feature representation of the sentence with a pre-trained model of bidirectional encoder representation from Transformer (BERT), and generates the candidate span of the trigger word on the sentence feature representation. Then the model filters the candidate span with low confidence using a classifier, and adjusts the boundaries of the candidate span by regression to accurately locate the trigger word. Finally, the adjusted candidate spans are classified, and extraction results are obtained. Experimental results on the ACE2005 Chinese dataset show that the F1 value of the span-based regression method is 73.20% for trigger word recognition task and 71.60% for trigger word classification task, better than existing models. Also, experimental comparison with span-based method without regression verifies that the regression adjustment of span boundaries can improve the accuracy of event trigger word detection.

Key words: event extraction, event trigger word, bidirectional encoder representation from Transformer (BERT), feature representation, span representation, regression adjustment

CLC Number: