一种面向不确定标签样本的K-近邻高效决策算法

doi:10.3969/j.issn.0255-8297.2020.05.001

应用科学学报 ›› 2020, Vol. 38 ›› Issue (5): 659-671.doi: 10.3969/j.issn.0255-8297.2020.05.001

• 智能计算新技术 • 上一篇

一种面向不确定标签样本的K-近邻高效决策算法

齐晴¹, 沈正飞¹, 曹健¹, 应俊², 赵龙²

1. 上海交通大学计算机科学与工程系, 上海 200240;
2. 上海海勃物流软件有限公司, 上海 200080

收稿日期:2020-06-19 发布日期:2020-10-14
通信作者: 曹健,教授,研究方向为智能数据分析.E-mail:cao-jian@sjtu.edu.cn E-mail:cao-jian@sjtu.edu.cn
基金资助:
国家重点研发计划项目（No.2019YFB1704405）资助

An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels

QI Qing¹, SHEN Zhengfei¹, CAO Jian¹, YING Jun², ZHAO Long²

1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
2. Shanghai Harbor e-Logistics Software Co., Ltd., Shanghai 200080, China

Received:2020-06-19 Published:2020-10-14

摘要/Abstract

摘要： 基于案例的决策是一种直接依据过去的历史案例对当前案例进行分类或者指标预测的方法，K-近邻方法就是一种广泛应用的基于案例的决策模型。在K-近邻方法中，历史案例上需要有标签，而在现实应用中，标签本身有一定的不确定性.文章详细地讨论了现有的基于K-近邻的决策方法忽略了样本标签不确定性这一问题，并基于Dempster-Shafer证据理论对标签不确定性进行建模以改善预测的性能，在此基础上结合边界树模型提高模型的运行效率.文中介绍了边界树算法的作用与原理，对如何结合传统边界树算法与样本标签的不确定性对边界树算法的节点转移策略以及决策过程进行了优化.文章最后对边界树算法的计算规模与准确率做了详细的实验论证.结果表明，文中提出的方法一方面考虑了标签的不确定性，另一方面提高了传统的K-近邻模型的决策效率.

关键词: K-近邻算法, 标签不确定性, 边界树算法, 计算速度优化

Abstract: Case-based decision-making is a method to directly classify or predict current cases based on past historical cases. The K-nearest neighbor method is a widely used casebased decision-making model. In the K-nearest neighbor method, historical cases need to be labeled. But in practical applications, the labels themselves have uncertainties. This article discusses the problem of label uncertainty which has been ignored in existing casebased decision-making methods in detail, and setups a label uncertainty model based on Dempster-Shafer evidence theory for improving prediction performance. In addition, in order to improve the operation efficiency, a new boundary tree algorithm by combining the traditional boundary tree algorithm and the label uncertainty is proposed. This paper introduces the function and principle of the boundary tree algorithm, and optimizes the node transfer strategy and decision process of the new boundary tree algorithm. Experimental demonstration shows that the proposed method not only takes the label uncertainty into consideration, but also improves the decision efficiency of the traditional K-nearest neighbor model.

Key words: K-nearest neighbor algorithm, uncertainties of labels, boundary tree algorithm, optimization of decision speed

中图分类号:

P751.1

齐晴, 沈正飞, 曹健, 应俊, 赵龙. 一种面向不确定标签样本的K-近邻高效决策算法[J]. 应用科学学报, 2020, 38(5): 659-671.

QI Qing, SHEN Zhengfei, CAO Jian, YING Jun, ZHAO Long. An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels[J]. Journal of Applied Sciences, 2020, 38(5): 659-671.

参考文献

[1] Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5):804-813.
[2] Mathy C, Derbinsky N, Bento J. The boundary forest algorithm for online supervised and unsupervised learning[C]//Proceeding of the 29th AAAI Conference on Artificial Intelligence, 2015:2864-2870.
[3] Tsymbal A. The problem of concept drift:definitions and related work[J]. Computer Science Department, 2004, 106(2):58.
[4] Noh Y K, Zhang B T, Lee D D. Generative local metric learning for nearest neighbor classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(1):106.
[5] García-Pedrajas N, del Castillo J A R, Cerruela-García G. A proposal for local k values for k-nearest neighbor rule[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 28(2):470-475.
[6] Filip F G, Zamfirescu C B, Ciurea C. Computer-supported collaborative decision-making[M]. Cham:Springer International Publishing, 2017.
[7] Shen H B, Yang J, Chou K C. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition[J]. Journal of Theoretical Biology, 2006, 240(1):9-13.
[8] Rosa J L A, Ebecken N F F. Data mining for data classification based on the KNN-fuzzy method supported by genetic algorithm[C]//International Conference on High Performance Computing for Computational Science. Springer, Berlin, Heidelberg, 2002:126-133.
[9] Ren Y, Li G, Zhang J. The maximum imputation framework for neighborhood-based collaborative filtering[J]. Social Network Analysis and Mining, 2014, 4(1):207.
[10] Ali P U S, Ventakeswaran D C J. Improved evidence theoretic KNN classifier based on theory of evidence[J]. International Journal of Computer Applications, 2011, 15(5):37-41.
[11] Gray R M. Entropy and information theory[M]. Berlin:Springer Science & Business Media, 2011.
[12] Zhu N, Cao J, Shen K. A decision support system with intelligent recommendation for multidisciplinary medical treatment[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16(1s):1-23.

[1]	高华, 王安帮, 王云才. 经典物理层高速密钥分发研究进展[J]. 应用科学学报, 2020, 38(4): 507-519.
[2]	王健, 陈诗. 基于传统光纤的涡旋光复用通信研究进展[J]. 应用科学学报, 2020, 38(4): 559-578.
[3]	折胜飞, 梅林, 周振宇, 侯超奇, 郭海涛. 空间光通信用耐辐照掺铒/铒镱共掺光纤研究进展[J]. 应用科学学报, 2020, 38(4): 579-594.
[4]	乔丽君, 杨强, 柴萌萌, 卫晓晶, 张建忠, 徐红春, 张明江. 混沌半导体激光器研究进展[J]. 应用科学学报, 2020, 38(4): 595-611.
[5]	郝腾飞, 石暖暖, 李伟, 祝宁华, 李明. 多波段线性调频傅里叶域锁模光电振荡器[J]. 应用科学学报, 2020, 38(4): 640-646.
[6]	迟楠, 牛文清, 贾俊连, 哈依那尔. 基于抗非线性SVM的几何整形可见光通信系统[J]. 应用科学学报, 2020, 38(4): 647-658.
[7]	张雨欣, 李熙, 宋杨, 李长辉. 基于“珞珈一号”夜光遥感影像的粤港澳大湾区城市空间形态分析[J]. 应用科学学报, 2020, 38(3): 466-477.
[8]	王孟轩, 张胜, 王月, 雷霆, 杜渂. 改进的CRNN模型在警情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3): 388-400.
[9]	赵春柳, 李嘉丽, 徐贲, 龚华平, 王东宁. 光纤微腔法布里-珀罗干涉传感器研究进展[J]. 应用科学学报, 2020, 38(2): 226-259.
[10]	耿优福, 李学金. 基于微结构光纤的温度传感器研究[J]. 应用科学学报, 2020, 38(2): 260-278.
[11]	陈嘉庚, 刘庆文, 赵双祥, 何祖源. 波长编码型光纤传感器高精度解调技术研究进展[J]. 应用科学学报, 2020, 38(2): 279-295.
[12]	赵云鹤, 刘云启. 少模光纤长周期光栅——从模式转换到高灵敏度光纤传感[J]. 应用科学学报, 2020, 38(2): 310-338.
[13]	杨亚南, 李一鸣, 聂力海, 张宁, 赵来平. 跨地域分布数据中心高成本效益的任务调度[J]. 应用科学学报, 2019, 37(6): 859-874.
[14]	刘烊佚, 苏成利, 施惠元, 李平, 薄桂华. 基于PFC-PID算法的无线高温加热炉温度控制系统[J]. 应用科学学报, 2019, 37(6): 875-886.
[15]	李松斌, 杨洁, 刘鹏, 王凌睿. 基于关联网络模型的H.264/AVC运动矢量调制信息隐藏检测方法[J]. 应用科学学报, 2019, 37(5): 663-672.

一种面向不确定标签样本的K-近邻高效决策算法

An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价