应用科学学报 ›› 2020, Vol. 38 ›› Issue (5): 659-671.doi: 10.3969/j.issn.0255-8297.2020.05.001

• 智能计算新技术 • 上一篇    

一种面向不确定标签样本的K-近邻高效决策算法

齐晴1, 沈正飞1, 曹健1, 应俊2, 赵龙2   

  1. 1. 上海交通大学 计算机科学与工程系, 上海 200240;
    2. 上海海勃物流软件有限公司, 上海 200080
  • 收稿日期:2020-06-19 发布日期:2020-10-14
  • 通信作者: 曹健,教授,研究方向为智能数据分析.E-mail:cao-jian@sjtu.edu.cn E-mail:cao-jian@sjtu.edu.cn
  • 基金资助:
    国家重点研发计划项目(No.2019YFB1704405)资助

An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels

QI Qing1, SHEN Zhengfei1, CAO Jian1, YING Jun2, ZHAO Long2   

  1. 1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2. Shanghai Harbor e-Logistics Software Co., Ltd., Shanghai 200080, China
  • Received:2020-06-19 Published:2020-10-14

摘要: 基于案例的决策是一种直接依据过去的历史案例对当前案例进行分类或者指标预测的方法,K-近邻方法就是一种广泛应用的基于案例的决策模型。在K-近邻方法中,历史案例上需要有标签,而在现实应用中,标签本身有一定的不确定性.文章详细地讨论了现有的基于K-近邻的决策方法忽略了样本标签不确定性这一问题,并基于Dempster-Shafer证据理论对标签不确定性进行建模以改善预测的性能,在此基础上结合边界树模型提高模型的运行效率.文中介绍了边界树算法的作用与原理,对如何结合传统边界树算法与样本标签的不确定性对边界树算法的节点转移策略以及决策过程进行了优化.文章最后对边界树算法的计算规模与准确率做了详细的实验论证.结果表明,文中提出的方法一方面考虑了标签的不确定性,另一方面提高了传统的K-近邻模型的决策效率.

关键词: K-近邻算法, 标签不确定性, 边界树算法, 计算速度优化

Abstract: Case-based decision-making is a method to directly classify or predict current cases based on past historical cases. The K-nearest neighbor method is a widely used casebased decision-making model. In the K-nearest neighbor method, historical cases need to be labeled. But in practical applications, the labels themselves have uncertainties. This article discusses the problem of label uncertainty which has been ignored in existing casebased decision-making methods in detail, and setups a label uncertainty model based on Dempster-Shafer evidence theory for improving prediction performance. In addition, in order to improve the operation efficiency, a new boundary tree algorithm by combining the traditional boundary tree algorithm and the label uncertainty is proposed. This paper introduces the function and principle of the boundary tree algorithm, and optimizes the node transfer strategy and decision process of the new boundary tree algorithm. Experimental demonstration shows that the proposed method not only takes the label uncertainty into consideration, but also improves the decision efficiency of the traditional K-nearest neighbor model.

Key words: K-nearest neighbor algorithm, uncertainties of labels, boundary tree algorithm, optimization of decision speed

中图分类号: