智能计算新技术

一种面向不确定标签样本的K-近邻高效决策算法

展开
  • 1. 上海交通大学 计算机科学与工程系, 上海 200240;
    2. 上海海勃物流软件有限公司, 上海 200080

收稿日期: 2020-06-19

  网络出版日期: 2020-10-14

基金资助

国家重点研发计划项目(No.2019YFB1704405)资助

An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels

Expand
  • 1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2. Shanghai Harbor e-Logistics Software Co., Ltd., Shanghai 200080, China

Received date: 2020-06-19

  Online published: 2020-10-14

摘要

基于案例的决策是一种直接依据过去的历史案例对当前案例进行分类或者指标预测的方法,K-近邻方法就是一种广泛应用的基于案例的决策模型。在K-近邻方法中,历史案例上需要有标签,而在现实应用中,标签本身有一定的不确定性.文章详细地讨论了现有的基于K-近邻的决策方法忽略了样本标签不确定性这一问题,并基于Dempster-Shafer证据理论对标签不确定性进行建模以改善预测的性能,在此基础上结合边界树模型提高模型的运行效率.文中介绍了边界树算法的作用与原理,对如何结合传统边界树算法与样本标签的不确定性对边界树算法的节点转移策略以及决策过程进行了优化.文章最后对边界树算法的计算规模与准确率做了详细的实验论证.结果表明,文中提出的方法一方面考虑了标签的不确定性,另一方面提高了传统的K-近邻模型的决策效率.

本文引用格式

齐晴, 沈正飞, 曹健, 应俊, 赵龙 . 一种面向不确定标签样本的K-近邻高效决策算法[J]. 应用科学学报, 2020 , 38(5) : 659 -671 . DOI: 10.3969/j.issn.0255-8297.2020.05.001

Abstract

Case-based decision-making is a method to directly classify or predict current cases based on past historical cases. The K-nearest neighbor method is a widely used casebased decision-making model. In the K-nearest neighbor method, historical cases need to be labeled. But in practical applications, the labels themselves have uncertainties. This article discusses the problem of label uncertainty which has been ignored in existing casebased decision-making methods in detail, and setups a label uncertainty model based on Dempster-Shafer evidence theory for improving prediction performance. In addition, in order to improve the operation efficiency, a new boundary tree algorithm by combining the traditional boundary tree algorithm and the label uncertainty is proposed. This paper introduces the function and principle of the boundary tree algorithm, and optimizes the node transfer strategy and decision process of the new boundary tree algorithm. Experimental demonstration shows that the proposed method not only takes the label uncertainty into consideration, but also improves the decision efficiency of the traditional K-nearest neighbor model.

参考文献

[1] Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5):804-813.
[2] Mathy C, Derbinsky N, Bento J. The boundary forest algorithm for online supervised and unsupervised learning[C]//Proceeding of the 29th AAAI Conference on Artificial Intelligence, 2015:2864-2870.
[3] Tsymbal A. The problem of concept drift:definitions and related work[J]. Computer Science Department, 2004, 106(2):58.
[4] Noh Y K, Zhang B T, Lee D D. Generative local metric learning for nearest neighbor classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(1):106.
[5] García-Pedrajas N, del Castillo J A R, Cerruela-García G. A proposal for local k values for k-nearest neighbor rule[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 28(2):470-475.
[6] Filip F G, Zamfirescu C B, Ciurea C. Computer-supported collaborative decision-making[M]. Cham:Springer International Publishing, 2017.
[7] Shen H B, Yang J, Chou K C. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition[J]. Journal of Theoretical Biology, 2006, 240(1):9-13.
[8] Rosa J L A, Ebecken N F F. Data mining for data classification based on the KNN-fuzzy method supported by genetic algorithm[C]//International Conference on High Performance Computing for Computational Science. Springer, Berlin, Heidelberg, 2002:126-133.
[9] Ren Y, Li G, Zhang J. The maximum imputation framework for neighborhood-based collaborative filtering[J]. Social Network Analysis and Mining, 2014, 4(1):207.
[10] Ali P U S, Ventakeswaran D C J. Improved evidence theoretic KNN classifier based on theory of evidence[J]. International Journal of Computer Applications, 2011, 15(5):37-41.
[11] Gray R M. Entropy and information theory[M]. Berlin:Springer Science & Business Media, 2011.
[12] Zhu N, Cao J, Shen K. A decision support system with intelligent recommendation for multidisciplinary medical treatment[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16(1s):1-23.
文章导航

/