Journal of Applied Sciences ›› 2006, Vol. 24 ›› Issue (4): 396-400.

• Articles • Previous Articles     Next Articles

An Efficient Clustering Algorithm of Large Scale and High Dimensional Data Set

ZHOU Xiao-yun, SUN Zhi-hui, ZHANG Bai-li   

  1. Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
  • Received:2005-02-25 Revised:2005-05-24 Online:2006-07-31 Published:2006-07-31

Abstract: Clustering large data set of high dimensionality has always been a serious challenge for clustering algorithms. Traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensionality and inherently sparse feature space of most real-world data sets.Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space.In addition, high-dimensional data often contain a significant amount of noise which causes additional effectiveness problems.To overcome these problems, a new algorithm based on CLIQUE, named OpCluster, is proposed.A set of experiments on a synthetic dataset demonstrate the effectiveness and efficiency of the new approach.

Key words: clustering algorithms, subspace clustering, optimal partition, data partition

CLC Number: