Journal of Applied Sciences ›› 2005, Vol. 23 ›› Issue (3): 292-296.
• Articles • Previous Articles Next Articles
HAN Jing-yu, HU Kong-fa, XU Li-zhen, DONG Yi-sheng
Received:
Revised:
Online:
Published:
Abstract: A new method for online data cleaning is presented.First, each clean record in the reference table is mapped as a point in a high-dimensional metric space measured by Manhattan distance.Next, all the points in the space are partitioned by clustering and indexed with B+ tree.In this way, the search in highdimensional space can be translated into search in one-dimensional space.To find the KNN (K nearest neighbors) in reference table for each incoming record, the search method of branch and bound is employed. The top K records that best match the incoming record are then identified.Theory and experiment show that it is an effective approach for online data cleaning.
Key words: B+tree, data cleaning, branch and bound
CLC Number:
TP311.11
HAN Jing-yu, HU Kong-fa, XU Li-zhen, DONG Yi-sheng. An Online Data Cleaning Method[J]. Journal of Applied Sciences, 2005, 23(3): 292-296.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jas.shu.edu.cn/EN/
https://www.jas.shu.edu.cn/EN/Y2005/V23/I3/292