Journal of Applied Sciences ›› 2005, Vol. 23 ›› Issue (3): 292-296.

• Articles • Previous Articles     Next Articles

An Online Data Cleaning Method

HAN Jing-yu, HU Kong-fa, XU Li-zhen, DONG Yi-sheng   

  1. Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
  • Received:2004-03-05 Revised:2004-10-13 Online:2005-05-31 Published:2005-05-31

Abstract: A new method for online data cleaning is presented.First, each clean record in the reference table is mapped as a point in a high-dimensional metric space measured by Manhattan distance.Next, all the points in the space are partitioned by clustering and indexed with B+ tree.In this way, the search in highdimensional space can be translated into search in one-dimensional space.To find the KNN (K nearest neighbors) in reference table for each incoming record, the search method of branch and bound is employed. The top K records that best match the incoming record are then identified.Theory and experiment show that it is an effective approach for online data cleaning.

Key words: B+tree, data cleaning, branch and bound

CLC Number: