应用科学学报 ›› 2019, Vol. 37 ›› Issue (3): 389-397.doi: 10.3969/j.issn.0255-8297.2019.03.009

• 信号与信息处理 • 上一篇    下一篇

CFMoment:挖掘数据流频繁闭项集算法

王金伟, 吴少华, 瞿治国   

  1. 南京信息工程大学 计算机与软件学院, 南京 210044
  • 收稿日期:2018-05-22 修回日期:2018-10-30 出版日期:2019-05-31 发布日期:2019-05-31
  • 作者简介:王金伟,教授,博导,研究方向:信息安全,E-mail:wjwei_2004@163.com
  • 基金资助:
    国家自然科学基金(No.60971006);国家"863"高技术研究发展计划基金(No.2010AA122202);澳门特别行政区科学技术发展基金(No.063/2010/A)资助

CFMoment: Closed Frequent Itemsets Mining Based on Data Stream

WANG Jingwei, WU Shaohua, QU Zhiguo   

  1. School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
  • Received:2018-05-22 Revised:2018-10-30 Online:2019-05-31 Published:2019-05-31

摘要: 在数据流上挖掘频繁闭项集是数据挖掘中关联性挖掘的重要研究课题之一.该文提出了一种高效的数据流频繁闭项挖掘算法——CFMoment,通过使用滑动窗口不断维护数据流中的频繁闭项集,可适用于实时性要求较高的多种数据流处理应用环境.该算法利用项目的有效比特序列表示来减少滑动窗口所需的时间和内存,进一步提升了在数据流中挖掘频繁闭项集的效率并有效降低了运行过程中的内存需求.实验表明,该算法不仅获得了高精度的挖掘结果,而且其运算速度明显快于现有的Moment算法,在数据流上挖掘频繁闭项集的内存消耗更少.

关键词: 频繁闭项集, 数据挖掘, 数据流, 滑动窗口

Abstract: Mining closed frequent itemsets over stream data is an important research issue of mining association rules in data mining. In this paper, we propose an efficient closed frequent itemsets mining algorithm in stream data, CFMoment, to maintain the set of closed frequent itemsets in data streams with a sliding window. The new algorithm can be applied to many stream data processing applications with high real-time requirements. It proposes to reduce the time and memory requirements in sliding windows by using the effective bit-sequence representation of items, which further improves the efficiency of closed frequent itemsets in stream data mining and effectively reduces the memory requirements in running process. Experiments show that the proposed algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than the existing algorithm Moment.

Key words: data streams, data mining, sliding window, closed frequent itemsets

中图分类号: