Journal of Applied Sciences ›› 2005, Vol. 23 ›› Issue (1): 71-74.

• Articles • Previous Articles     Next Articles

Clustering XML Documents Based on a Structural Summary Tree

LIANG Zuo-peng, WU Wen-ming, DONG Yi-sheng   

  1. Department of Computer Science & Engineering, Southeast University, Nanjing 210096, China
  • Received:2003-11-01 Revised:2004-03-15 Online:2005-01-31 Published:2005-01-31

Abstract: An approach for calculating the structural similarity between XML documents is proposed in this paper.The structural information of an XML document is captured with a structural summary tree (SST).By encoding elements as digital numbers, a SST is transformed to a digit-labeled tree.Digital numbers at different tree levels are concatenated to form a vector after the normalization process.Consequently, each XML document is represented as an m-dimension vector.The GA-based clustering algorithm is adopted since it is able to provide good results irrespective of the starting configuration.Experimental results show the effectiveness and scalability of the approach.

Key words: XML, GA, SST (structure summary tree), information retrieval, document clustering

CLC Number: