Novel Technologies for Intelligent Computing

An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels

Expand
  • 1. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
    2. Shanghai Harbor e-Logistics Software Co., Ltd., Shanghai 200080, China

Received date: 2020-06-19

  Online published: 2020-10-14

Abstract

Case-based decision-making is a method to directly classify or predict current cases based on past historical cases. The K-nearest neighbor method is a widely used casebased decision-making model. In the K-nearest neighbor method, historical cases need to be labeled. But in practical applications, the labels themselves have uncertainties. This article discusses the problem of label uncertainty which has been ignored in existing casebased decision-making methods in detail, and setups a label uncertainty model based on Dempster-Shafer evidence theory for improving prediction performance. In addition, in order to improve the operation efficiency, a new boundary tree algorithm by combining the traditional boundary tree algorithm and the label uncertainty is proposed. This paper introduces the function and principle of the boundary tree algorithm, and optimizes the node transfer strategy and decision process of the new boundary tree algorithm. Experimental demonstration shows that the proposed method not only takes the label uncertainty into consideration, but also improves the decision efficiency of the traditional K-nearest neighbor model.

Cite this article

QI Qing, SHEN Zhengfei, CAO Jian, YING Jun, ZHAO Long . An Efficient K-Nearest Neighbor Decision Algorithm for Samples with Uncertain Labels[J]. Journal of Applied Sciences, 2020 , 38(5) : 659 -671 . DOI: 10.3969/j.issn.0255-8297.2020.05.001

References

[1] Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5):804-813.
[2] Mathy C, Derbinsky N, Bento J. The boundary forest algorithm for online supervised and unsupervised learning[C]//Proceeding of the 29th AAAI Conference on Artificial Intelligence, 2015:2864-2870.
[3] Tsymbal A. The problem of concept drift:definitions and related work[J]. Computer Science Department, 2004, 106(2):58.
[4] Noh Y K, Zhang B T, Lee D D. Generative local metric learning for nearest neighbor classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(1):106.
[5] García-Pedrajas N, del Castillo J A R, Cerruela-García G. A proposal for local k values for k-nearest neighbor rule[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 28(2):470-475.
[6] Filip F G, Zamfirescu C B, Ciurea C. Computer-supported collaborative decision-making[M]. Cham:Springer International Publishing, 2017.
[7] Shen H B, Yang J, Chou K C. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition[J]. Journal of Theoretical Biology, 2006, 240(1):9-13.
[8] Rosa J L A, Ebecken N F F. Data mining for data classification based on the KNN-fuzzy method supported by genetic algorithm[C]//International Conference on High Performance Computing for Computational Science. Springer, Berlin, Heidelberg, 2002:126-133.
[9] Ren Y, Li G, Zhang J. The maximum imputation framework for neighborhood-based collaborative filtering[J]. Social Network Analysis and Mining, 2014, 4(1):207.
[10] Ali P U S, Ventakeswaran D C J. Improved evidence theoretic KNN classifier based on theory of evidence[J]. International Journal of Computer Applications, 2011, 15(5):37-41.
[11] Gray R M. Entropy and information theory[M]. Berlin:Springer Science & Business Media, 2011.
[12] Zhu N, Cao J, Shen K. A decision support system with intelligent recommendation for multidisciplinary medical treatment[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2020, 16(1s):1-23.
Outlines

/