Journal of Applied Sciences ›› 1999, Vol. 17 ›› Issue (2): 148-155.

• Articles • Previous Articles     Next Articles

Context -Sensitive Automatic Chinese Word Segmentation and Lexical Preprocessing

HUANG HEYAN, LI YUSHENG   

  1. Research Center of Computer & Language Information Engineering, Academia Sinica, Beijing 100083
  • Received:1998-02-19 Revised:1998-06-08 Online:1999-06-30 Published:1999-06-30

Abstract: In this paper, a context -sensitive automatic Chinese word segmentation and lexical preprocessing for Chinese-English machine translation system is proposed. This algorithm incorporates with improved MM matching and rule based context -sensitive ambiguity resolution by taking advantage of large amount of syntax, semantic and common sense knowledge in the lexicon of MT system. Its accurate rate reaches up to 99%. On the same time, in this algorithm, some lexical phonomena, such as reduplication word, function word, etc. are also processed, so as to deduce the amont of words in lexicon entry, and facilitate the parsing of a Chinese sentence.

Key words: lexical preprocessing, automatic Chinese word segmentation, machine translation