Journal of Applied Sciences ›› 2005, Vol. 23 ›› Issue (4): 399-403.
• Articles • Previous Articles Next Articles
LIANG Zuo-peng, YE Ning, DONG Yi-sheng
Received:
Revised:
Online:
Published:
Abstract: Most existing clustering techniques for XML are based on the concept of edit-distance and have two main disadvantages:(1) very high time complexity and (2) difficulty in understanding the description of the resulting clusters.In this paper, a novel approach called path-based clustering (PBC) is presented.Instead of comparing XML documents structure and clustering them directly, the paths contained in these documents are clustered.For each path, a cluster containing documents that have that path is formed.After that, clusters that contain similar sets of documents are combined.The resulting clusters contain documents that share a similar set of paths.Experimental results show the effectiveness and efficiency of this approach.
Key words: information retrieval, XML, document clustering
CLC Number:
TP311
LIANG Zuo-peng, YE Ning, DONG Yi-sheng. PBC: A Path-Based Method to Clustering XML Documents[J]. Journal of Applied Sciences, 2005, 23(4): 399-403.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jas.shu.edu.cn/EN/
https://www.jas.shu.edu.cn/EN/Y2005/V23/I4/399