应用科学学报 ›› 2006, Vol. 24 ›› Issue (2): 203-207.

• 论文 • 上一篇    下一篇

密度相关的数据流偏倚抽样

杨宜东, 孙志挥   

  1. 东南大学计算机科学与工程系, 江苏南京 210096
  • 收稿日期:2004-12-29 修回日期:2005-03-29 出版日期:2006-03-31 发布日期:2006-03-31
  • 作者简介:杨宜东,博士生,研究方向:数据挖掘与知识发现,E-mail:ydyang@seu.edu.cn;孙志挥,教授,博导,研究方向:数据库系统、应用及知识发现,E-mail:sunzh@seu.edu.cn
  • 基金资助:
    国家自然科学基金(70371015);教育部高等学校博士学科点科研基金(20040286009)和南瑞继保学位基金资助项目

Biased Sampling of Data Streams Based on Density

YANG Yi-dong, SUN Zhi-hui   

  1. Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China
  • Received:2004-12-29 Revised:2005-03-29 Online:2006-03-31 Published:2006-03-31

摘要: 利用数据空间动态网格划分的方法,对数据流空间的数据分布密度情况进行模拟,并在此基础上提出了一种基于密度的偏倚抽样方法.为验证该抽样方法的有效性,将其应用到数据流中的聚类挖掘,实验结果表明该算法具有良好的适用性和有效性.

关键词: 数据流, 偏倚抽样, 聚类

Abstract: As an important kind of data source, data stream has received increasing attention.Data stream management systems and data mining based on data streams have also attracted much research interest.With dynamical gridpartitioning of the data space, distribution density of data streams is approximated, and based on which a density biased sampling method is presented.To test its efficiency, the proposed sampling method is applied to clustering data streams. Experimental results show promising applicability of the approach.

Key words: data streams, clustering, biased sampling

中图分类号: