Incremental clustering algorithm has the ability to solve the problem that large data volume cannot be read into memory at one time. The traditional incremental multiple medoids based fuzzy clustering (IMMFC) algorithm selects only one or a fixed number of center points for each data block, thus leading to a poor clustering performance when the object weights in the cluster are small. A new incremental fuzzy clustering algorithm is proposed for processing large data sets. Firstly, the algorithm divides the large data set into multiple small data blocks and performs fuzzy clustering on each small data block. Then, the target center point is selected from each cluster of each small data block. The number of center points is the minimum number of objects whose sum of weights of objects in the cluster is greater than a threshold. Finally, all selected center points are merged, and the final data block is fuzzy clustered to obtain the final center point. Experimental results show that the algorithm works superior to IMMFC algorithm in the case that the data block accounts for more than 10% of the total data.
HU Bengu, DAI Muhong
. Multiple-center Points Incremental Fuzzy Clustering Algorithm[J]. Journal of Applied Sciences, 2019
, 37(6)
: 806
-814
.
DOI: 10.3969/j.issn.0255-8297.2019.06.005
[1] Bie R F, Mehmood R. Adaptive fuzzy clustering by fast search and find of density peaks[J]. Personal and Ubiquitous Computing, 2016, 20(5):785-792.
[2] 李滔,王士同. 适合大规模数据集的增量式模糊聚类算法[J]. 智能系统学报,2016, 11(2):188-199. Li T, Wang S T. Incremental fuzzy (c+p)-means clustering for large data[J]. China Association of Artificial Intelligence Transactions on intelligent Systems, 2016, 11(2):188-199. (in Chinese)
[3] Bezdek J C, Ehrlich R, Full W. FCM:the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2):191-203.
[4] 吴佳,罗可. 改进的模糊C均值的增量聚类算法[J]. 计算机工程与应用,2011, 47(23):141-142. Wu J, Luo K. Improved fuzzy C-means incremental clustering algorithm[J]. Computer Engineering and Applications, 2011, 47(23):141-142. (in Chinese)
[5] 於跃成,生佳根,江峰琴,等. 基于混合高斯模型的增量式聚类[J]. 江苏科技大学学报(自然科学版),2011, 25(6):597-601. Yu Y C, Sheng J G, Jiang F Q, et al. Incremental clustering based on Gaussian mixture model[J]. Journal of Jiangsu University of Science and Technology (Natural Science Edition), 2011, 25(6):597-601. (in Chinese)
[6] Cheng C Y, Bao C H. A Kernelized fuzzy C-means clustering algorithm based on bat algorithm[C]//International Conference on Computer and Automation Engineering, Brisbane, Australia, 2018:1-5.
[7] Huo W G, Qu F, Zhang Y X. Incremental learning of the triangular membership functions based on single-pass FCM and CHC genetic model[J]. High Technology Letters, 2017, 23(1):7-15.
[8] Fern X Z, Brodley C E. Random projection for high dimensional data clustering:a cluster ensemble approach[C]//The 20th International Conference on International Conference on Machine Learning. AAAI Press, 2013:186-193.
[9] Ericson K, Pallickara S. On the performance of high dimensional data clustering and classification algorithms[J]. Future Generation Computer Systems, 2013, 29(4):1024-1034.
[10] Yu Z, Luo P, You J, et al. Incremental Semi-supervised clustering ensemble for high dimensional data clustering[J]. IEEE Transactions on Knowledge & Data Engineering, 2016, 28(3):701-714.
[11] Alijamaat A, Khalilian M, Mustapha N. A novel approach for high dimensional data clustering[C]//International Conference on Knowledge Discovery and Data Mining. IEEE Computer Society, 2010:264-267.
[12] Krishnapuram R, Joshi A, Nasraoui O, et al. Low-complexity fuzzy relational clustering algorithms for Web mining[J]. IEEE Transactions on Fuzzy Systems, 2001, 9(4):595-607.
[13] Mei J P, Chen L. Fuzzy clustering with weighted medoids for relational data[J]. Pattern Recognition, 2010, 43(5):1964-1974.
[14] Wang Y, Chen L, Mei J P. Incremental fuzzy clustering with multiple medoids for large data[J]. IEEE Transactions on Fuzzy Systems, 2014, 22(6):1557-1568.