Journal of Applied Sciences ›› 2023, Vol. 41 ›› Issue (1): 1-9.doi: 10.3969/j.issn.0255-8297.2023.01.001

• Special Issue on Computer Applications • Previous Articles     Next Articles

Named Entity Recognition Algorithm Enhanced with Entity Category Information

LIU Minghui1, TANG Wangjing1, XU Bin1, TONG Meihan1, WANG Liming2, ZHONG Qi2, XU Jianjun3   

  1. 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
    2. China Research Institute for Science Popularization, Beijing 100081, China;
    3. Beijing Caizhi Technology Co., Ltd., Beijing 100081, China
  • Received:2022-06-30 Online:2023-01-31 Published:2023-02-03

Abstract: To solve the problem that the character level model of Chinese named entity recognition (NER) may ignore word information in sentences, a Chinese NER method based on entity category information enhancement in knowledge graph was proposed. Firstly, a training set was segmented with word segmentation tool, and all possible words were selected to construct a vocabulary. Secondly, the category information of entities in the vocabulary was retrieved by using generic knowledge graph, to construct a word set related to characters in a simple and effective way, and an entity category information set is generated according to the category information of entities in the word set. Finally, word embedding method was used to convert the set of category information into embeddings and concatenation of character embeddings, so as to enrich features in embedding layer. The proposed method can either be used as a module to expand feature diversity of embedding layer, or jointly applies with a variety of encoder-decoder models. Experiments on the Chinese NER dataset proposed by Microsoft Research Asia (MSRA) show the superiority of the proposed model. Compared with the models of Bi-directional long short-term memory (Bi-LSTM) and Bi-LSTM plus with conditional random field (CRF), the proposed method increases F1 by 11.00% and 3.09% respectively, verifying that the category information of entities in knowledge graph performs high effectiveness in the enhancement of Chinese NER.

Key words: named entity recognition (NER), knowledge graph, entity category information, knowledge enhancement

CLC Number: