基于密度峰值剪枝后的最短路径聚类算法

胡恩祥, 汪春雨, 潘美芹

doi:10.3969/j.issn.0255-8297.2020.05.010

应用科学学报 >

2020 , Vol. 38 >Issue 5: 792 - 802

DOI: https://doi.org/10.3969/j.issn.0255-8297.2020.05.010

智能计算新技术

基于密度峰值剪枝后的最短路径聚类算法

展开

1. 上海外国语大学国际工商管理学院, 上海 201600;
2. 华东师范大学计算机科学与技术学院, 上海 200062

收稿日期: 2020-05-25

网络出版日期: 2020-10-14

基金资助

上海外国语大学规划项目基金（No.2019114009）资助

收起

Clustering by Pruning Paths Based on Shortest Paths from Density Peaks

Expand

1. School of Business and Management, Shanghai International Studies University, Shanghai 201600, China;
2. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China

Received date: 2020-05-25

Online published: 2020-10-14

Fold

摘要

聚类是通过数据标签或者属性，将一系列经验数据按照相似性或者相近性进行归类.基于密度属性展开的聚类算法，主要聚焦在聚类中心的确定和剩余点如何分配的问题上展开讨论.针对基于密度峰值的可训练最短路径算法，通过密度峰值确定聚类中心，提出使用截断阈值、对路径图进行剪枝的算法改进.然后基于最短路径法对剩余点进行全局分配.实验结果证明，在保持聚类精度的同时，有效地提升了算法执行效率.

关键词： 聚类; 密度峰值; 最短路径法; 路径剪枝

本文引用格式

胡恩祥, 汪春雨, 潘美芹 . 基于密度峰值剪枝后的最短路径聚类算法[J]. 应用科学学报, 2020 , 38(5) : 792 -802 . DOI: 10.3969/j.issn.0255-8297.2020.05.010

Abstract

Clustering is to classify multiple empirical data according to their similarity or proximity based on data labels and properties. For the clustering algorithm based on the density peaks, it mainly focuses on the determination of the clustering center and how to allocate the remaining points. In this paper, according to a trainable clustering algorithm based on shortest paths to density peaks, the clustering center is determined by the density peaks. We propose that using a cutoff threshold and pruning the path graph to improve the algorithm. The remaining points are allocated globally based on the shortest path method. It is proved that the algorithm can significantly improve the efficiency while maintaining the clustering accuracy.

Key words： clustering; density peak; shortest path method; pruning path

参考文献

[1] Ester M, Kriegel H P. A density-based algorithm for discovering clusters in large spatial databases with noise[J]. Knowledge Discovery & Data Mining, 1996, 96(34):226-231.
[2] 夏鲁宁, 荆继武. SA-DBSCAN:一种自适应基于密度聚类算法[J]. 中国科学院研究生院学报, 2009, 26(4):530-538. Xia L N, Jing J W. SA-DBSCAN:an adaptive clustering algorithm based on density[J]. Journal of University of Chinese Academy of Science, 2009, 26(4):530-538. (in Chinese)
[3] 周水庚, 周傲英, 曹晶. 基于数据分区的DBSCAN算法[J]. 计算机研究与发展, 2000, 37(10):1153-1159. Zhou S G, Zhou A Y, Cao J. DBSCAN algorithm based on data partition[J]. Journal of Computer Research and Development, 2000, 37(10):1153-1159. (in Chinese)
[4] 周水庚, 周傲英, 曹晶, 等. 一种基于密度的快速聚类算法[J]. 计算机研究与发展, 2000, 37(11):8-13. Zhou S G, Zhou A Y, Cao J, et al. A fast clustering algorithm based on density[J]. Journal of Computer Research and Development, 2000, 37(11):8-13. (in Chinese)
[5] Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[6] Pizzagalli D U, Gonzalez S F, Krause R. A trainable clustering algorithm based on shortest paths from density peaks[J]. Science Advances, 2019, 5(10):eaax3770.
[7] Wang X F, Xu Y F. Fast clustering using adaptive density peak detection[J]. Statistical Methods in Medical Research, 2017, 26(6):2800-2811.
[8] 谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学:信息科学, 2016, 46(2):258-280. Xie J Y, Gao H C, Xie W X. A fast clustering algorithm based on K-nearest neighbor optimized density[J]. Science China:Information Science, 2016, 46(2):258-280. (in Chinese)
[9] Wang S L, Wang D K, Li C Y, et al. Clustering by fast search and find of density peaks with data field[J]. Chinese Journal of Electronics, 2016, 25(3):397-402.
[10] 高诗莹, 周晓锋, 李帅. 基于密度比例的密度峰值聚类算法[J]. 计算机工程与应用, 2017, 53(16):10-17. Gao S Y, Zhou X F, Li S. A fast clustering algorithm based on density ratio[J]. Computer Engineering and Applications, 2017, 53(16):10-17. (in Chinese)
[11] Zhang W, Li J. Extended fast search clustering algorithm:widely density clusters, no density peaks[J]. Computer Science, 2015, 5(7):415-423.
[12] 蒋礼青, 张明新, 郑金龙, 等. 快速搜索与发现密度峰值聚类算法的优化研究[J]. 计算机应用研究, 2016, 33(11):3251-3254. Jiang L Q, Zhang M X, Zheng J L, et al. Optimization research of the fast clustering algorithm based on density peaks[J]. Application Research of Computers, 2016, 33(11):3251-3254. (in Chinese)
[13] Tang G, Jia S, Li J. An enhanced density peak-based clustering approach for hyperspectral band selection[C]//2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2015:1116-1119.
[14] Tenenbaum J B, Silva V D, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500):2319-2323.
[15] Xu X H, Ju Y S, Liang Y L, et al. Manifold density peaks clustering algorithm[C]//Third International Conference on Advanced Cloud & Big Data. IEEE, 2016:311-318.
[16] Du M, Ding S, Xu X, et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning & Cybernetics, 2017, 9(8):1335-1349.
[17] Cheng Y. Mean shift, mode seeking, and clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8):790-799.
[18] Satopaa V, Albrecht J, Irwin D, et al. Finding a kneedle in a haystack:detecting knee points in system behavior[C]//International Conference on Distributed Computing Systems Workshops. IEEE Computer Society, 2011:166-171.
[19] Wiwie C, Baumbach J, RÖTtger R. Comparing the performance of biomedical clustering methods[J]. Nature Methods, 2015, 12(11):1033.
[20] 周志华. 机器学习[J]. 中国民商, 2016(3):198-199. Zhou Z H. Machine learning[J]. China Non-Governmental, 2016(3):198-199. (in Chinese)

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献