Journal of Applied Sciences ›› 2013, Vol. 31 ›› Issue (2): 197-203.doi: 10.3969/j.issn.0255-8297.2013.02.015

• Computer Science and Applications • Previous Articles     Next Articles

Text Categorization Based on Concept Knowledge

DING Ze-ya1,2, ZHANG Quan1   

  1. 1. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
    2. Graduate University of Chinese Academy of Sciences, Beijing 100039, China
  • Received:2011-08-26 Revised:2012-01-08 Online:2013-03-25 Published:2012-01-08

Abstract:  To achieve semantic understanding, this paper proposes a method for text categorization based on concept-knowledge in the hierarchical network of concepts (HNC). The method includes two parts: feature selection using concepts and text categorization according to category relatedness degree. In this paper, category key concepts are explored by computing discrimination degree of concepts, and used to further reduce dimensionality of the feature space. Based on the category semantic information consisting of category key concepts and relatedness weights, the method of computing relatedness degrees between documents and categories is proposed. The category relatedness degree of document is used as a measure for text categorization. Experiments show that the proposed method can effectively reduce dimensionality of feature space, increase efficiency and ensure effectiveness of text categorization. Compared with SVM, KNN and Bayes, this method is the best in terms of F1 values at higher feature reduction levels. In terms of overall performance, the method is almost equivalent to SVM, and better than KNN and Bayes.

Key words: concept, concept discrimination, category relatedness,  text categorization, hierarchical network of concepts

CLC Number: