多中心点增量式模糊聚类算法

doi:10.3969/j.issn.0255-8297.2019.06.005

应用科学学报 ›› 2019, Vol. 37 ›› Issue (6): 806-814.doi: 10.3969/j.issn.0255-8297.2019.06.005

多中心点增量式模糊聚类算法

胡本固, 戴牡红

湖南大学信息科学与工程学院, 长沙 410082

收稿日期:2018-08-09 修回日期:2019-03-10 出版日期:2019-11-30 发布日期:2019-12-06
通信作者: 戴牡红,研究员,研究方向:数据科学,E-mail:dmh@hnu.edu.cn E-mail:dmh@hnu.edu.cn
基金资助:
长沙市科技计划项目基金（No.kq1801008）资助

Multiple-center Points Incremental Fuzzy Clustering Algorithm

HU Bengu, DAI Muhong

College of Information Science and Engineering, Hunan University, Changsha 410082, China

Received:2018-08-09 Revised:2019-03-10 Online:2019-11-30 Published:2019-12-06

摘要/Abstract

摘要： 增量聚类算法可以解决数据量大、内存不足的问题.传统的增量式模糊聚类（incremental multiple medoids based fuzzy clustering，IMMFC）算法只为每个数据块选择一个或多个相同数目的中心，当聚类中的对象权重较小时聚类效果不好.该文提出新的增量式模糊聚类算法用于处理大数据集.首先将大数据集分成多个小的数据块，并对每个小的数据块进行模糊聚类；然后从每个小数据块的每个簇群中选择目标中心点，中心点的个数是簇群中对象的权重之和大于阈值的最少对象数.最后合并所有选定的中心点，并对最终数据块进行模糊聚类，获取最终的中心点.实验结果表明，与IMMFC算法相比，当数据块占总数据的10%以上时，所提算法优于IMMFC.

关键词: 模糊聚类, 增量式模糊聚类, 大数据集, 多中心点

Abstract: Incremental clustering algorithm has the ability to solve the problem that large data volume cannot be read into memory at one time. The traditional incremental multiple medoids based fuzzy clustering (IMMFC) algorithm selects only one or a fixed number of center points for each data block, thus leading to a poor clustering performance when the object weights in the cluster are small. A new incremental fuzzy clustering algorithm is proposed for processing large data sets. Firstly, the algorithm divides the large data set into multiple small data blocks and performs fuzzy clustering on each small data block. Then, the target center point is selected from each cluster of each small data block. The number of center points is the minimum number of objects whose sum of weights of objects in the cluster is greater than a threshold. Finally, all selected center points are merged, and the final data block is fuzzy clustered to obtain the final center point. Experimental results show that the algorithm works superior to IMMFC algorithm in the case that the data block accounts for more than 10% of the total data.

Key words: fuzzy clustering, incremental fuzzy clustering, large data set, multiple-center points

中图分类号:

TP311.11

胡本固, 戴牡红. 多中心点增量式模糊聚类算法[J]. 应用科学学报, 2019, 37(6): 806-814.

HU Bengu, DAI Muhong. Multiple-center Points Incremental Fuzzy Clustering Algorithm[J]. Journal of Applied Sciences, 2019, 37(6): 806-814.

参考文献

[1] Bie R F, Mehmood R. Adaptive fuzzy clustering by fast search and find of density peaks[J]. Personal and Ubiquitous Computing, 2016, 20(5):785-792.
[2] 李滔,王士同. 适合大规模数据集的增量式模糊聚类算法[J]. 智能系统学报,2016, 11(2):188-199. Li T, Wang S T. Incremental fuzzy (c+p)-means clustering for large data[J]. China Association of Artificial Intelligence Transactions on intelligent Systems, 2016, 11(2):188-199. (in Chinese)
[3] Bezdek J C, Ehrlich R, Full W. FCM:the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2):191-203.
[4] 吴佳,罗可. 改进的模糊C均值的增量聚类算法[J]. 计算机工程与应用,2011, 47(23):141-142. Wu J, Luo K. Improved fuzzy C-means incremental clustering algorithm[J]. Computer Engineering and Applications, 2011, 47(23):141-142. (in Chinese)
[5] 於跃成,生佳根,江峰琴,等. 基于混合高斯模型的增量式聚类[J]. 江苏科技大学学报(自然科学版),2011, 25(6):597-601. Yu Y C, Sheng J G, Jiang F Q, et al. Incremental clustering based on Gaussian mixture model[J]. Journal of Jiangsu University of Science and Technology (Natural Science Edition), 2011, 25(6):597-601. (in Chinese)
[6] Cheng C Y, Bao C H. A Kernelized fuzzy C-means clustering algorithm based on bat algorithm[C]//International Conference on Computer and Automation Engineering, Brisbane, Australia, 2018:1-5.
[7] Huo W G, Qu F, Zhang Y X. Incremental learning of the triangular membership functions based on single-pass FCM and CHC genetic model[J]. High Technology Letters, 2017, 23(1):7-15.
[8] Fern X Z, Brodley C E. Random projection for high dimensional data clustering:a cluster ensemble approach[C]//The 20th International Conference on International Conference on Machine Learning. AAAI Press, 2013:186-193.
[9] Ericson K, Pallickara S. On the performance of high dimensional data clustering and classification algorithms[J]. Future Generation Computer Systems, 2013, 29(4):1024-1034.
[10] Yu Z, Luo P, You J, et al. Incremental Semi-supervised clustering ensemble for high dimensional data clustering[J]. IEEE Transactions on Knowledge & Data Engineering, 2016, 28(3):701-714.
[11] Alijamaat A, Khalilian M, Mustapha N. A novel approach for high dimensional data clustering[C]//International Conference on Knowledge Discovery and Data Mining. IEEE Computer Society, 2010:264-267.
[12] Krishnapuram R, Joshi A, Nasraoui O, et al. Low-complexity fuzzy relational clustering algorithms for Web mining[J]. IEEE Transactions on Fuzzy Systems, 2001, 9(4):595-607.
[13] Mei J P, Chen L. Fuzzy clustering with weighted medoids for relational data[J]. Pattern Recognition, 2010, 43(5):1964-1974.
[14] Wang Y, Chen L, Mei J P. Incremental fuzzy clustering with multiple medoids for large data[J]. IEEE Transactions on Fuzzy Systems, 2014, 22(6):1557-1568.

多中心点增量式模糊聚类算法

Multiple-center Points Incremental Fuzzy Clustering Algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

[1]	薛立华, 黄洪钟, 时章明. 基于人工神经网络的组合预报模型与应用[J]. 应用科学学报, 2004, 22(4): 509-512.
[2]	万存绪, 张效勇. 模糊数学在土壤质量评价中的应用[J]. 应用科学学报, 1991, 9(4): 359-365.