应用科学学报 ›› 2019, Vol. 37 ›› Issue (6): 806-814.doi: 10.3969/j.issn.0255-8297.2019.06.005

• 信号与信息处理 • 上一篇    下一篇

多中心点增量式模糊聚类算法

胡本固, 戴牡红   

  1. 湖南大学 信息科学与工程学院, 长沙 410082
  • 收稿日期:2018-08-09 修回日期:2019-03-10 出版日期:2019-11-30 发布日期:2019-12-06
  • 通信作者: 戴牡红,研究员,研究方向:数据科学,E-mail:dmh@hnu.edu.cn E-mail:dmh@hnu.edu.cn
  • 基金资助:
    长沙市科技计划项目基金(No.kq1801008)资助

Multiple-center Points Incremental Fuzzy Clustering Algorithm

HU Bengu, DAI Muhong   

  1. College of Information Science and Engineering, Hunan University, Changsha 410082, China
  • Received:2018-08-09 Revised:2019-03-10 Online:2019-11-30 Published:2019-12-06

摘要: 增量聚类算法可以解决数据量大、内存不足的问题.传统的增量式模糊聚类(incremental multiple medoids based fuzzy clustering,IMMFC)算法只为每个数据块选择一个或多个相同数目的中心,当聚类中的对象权重较小时聚类效果不好.该文提出新的增量式模糊聚类算法用于处理大数据集.首先将大数据集分成多个小的数据块,并对每个小的数据块进行模糊聚类;然后从每个小数据块的每个簇群中选择目标中心点,中心点的个数是簇群中对象的权重之和大于阈值的最少对象数.最后合并所有选定的中心点,并对最终数据块进行模糊聚类,获取最终的中心点.实验结果表明,与IMMFC算法相比,当数据块占总数据的10%以上时,所提算法优于IMMFC.

关键词: 模糊聚类, 增量式模糊聚类, 大数据集, 多中心点

Abstract: Incremental clustering algorithm has the ability to solve the problem that large data volume cannot be read into memory at one time. The traditional incremental multiple medoids based fuzzy clustering (IMMFC) algorithm selects only one or a fixed number of center points for each data block, thus leading to a poor clustering performance when the object weights in the cluster are small. A new incremental fuzzy clustering algorithm is proposed for processing large data sets. Firstly, the algorithm divides the large data set into multiple small data blocks and performs fuzzy clustering on each small data block. Then, the target center point is selected from each cluster of each small data block. The number of center points is the minimum number of objects whose sum of weights of objects in the cluster is greater than a threshold. Finally, all selected center points are merged, and the final data block is fuzzy clustered to obtain the final center point. Experimental results show that the algorithm works superior to IMMFC algorithm in the case that the data block accounts for more than 10% of the total data.

Key words: fuzzy clustering, incremental fuzzy clustering, large data set, multiple-center points

中图分类号: