在数据流上挖掘频繁闭项集是数据挖掘中关联性挖掘的重要研究课题之一.该文提出了一种高效的数据流频繁闭项挖掘算法——CFMoment,通过使用滑动窗口不断维护数据流中的频繁闭项集,可适用于实时性要求较高的多种数据流处理应用环境.该算法利用项目的有效比特序列表示来减少滑动窗口所需的时间和内存,进一步提升了在数据流中挖掘频繁闭项集的效率并有效降低了运行过程中的内存需求.实验表明,该算法不仅获得了高精度的挖掘结果,而且其运算速度明显快于现有的Moment算法,在数据流上挖掘频繁闭项集的内存消耗更少.
Mining closed frequent itemsets over stream data is an important research issue of mining association rules in data mining. In this paper, we propose an efficient closed frequent itemsets mining algorithm in stream data, CFMoment, to maintain the set of closed frequent itemsets in data streams with a sliding window. The new algorithm can be applied to many stream data processing applications with high real-time requirements. It proposes to reduce the time and memory requirements in sliding windows by using the effective bit-sequence representation of items, which further improves the efficiency of closed frequent itemsets in stream data mining and effectively reduces the memory requirements in running process. Experiments show that the proposed algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than the existing algorithm Moment.
[1] Garofalakis M N, Gehrke J, Rastogi R. Querying and mining data streams:you only get one look a tutorial[C]//ACM Sigmod International Conference. ACM, 2002, 16(1):635.
[2] Babcock B, Babu S, Datar M. Models and issues in data stream systems[C]//ACM SigmodSigact-Sigart Symposium on Principles of Database Systems. ACM, 2002:1-16.
[3] Jiang N, Gruenwald L. Research issues in data stream association rule mining[J]. ACM Sigmod Record, 2006, 35(1):14-19.
[4] Manku G S, Motwani R. Approximate frequency counts over data streams[M]. VLDB Endowment, 2012, 5(12):346-357.
[5] Li H F, Lee S Y, Shan M K. An efficient algorithm for mining frequent itemsets over the entire history of data streams[C]//Proceedings of the first International Workshop on Knowledge Discovery in Data Streams, 2004, 1(1):1.
[6] Li H F, Lee S Y, Shan M K. Online mining (recently) maximal frequent itemsets over data streams[C]//Proceedings of the 15th IEEE International Workshop on Research Issues on Data Engineering, 2005:11-18.
[7] Yu J X, Chong Z, Lu H. A false negative approach to mining frequent itemsets from high speed transactional data streams[J]. Information Sciences, 2006, 176(14):1986-2015.
[8] Liu X, Guan J, Hu P. Mining frequent closed itemsets from a landmark window over online data streams[J]. Computers and Mathematics with Applications, 2009, 57(6):927-936.
[9] Chang J, Lee W. Finding recently frequent itemsets adaptively over online transactional data streams[J]. Information Systems, 2006, 31(8):849-869.
[10] Woo H J, Lee W S. EstMax:tracing maximal frequent itemsets instantly over online transactional data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(10):1418-1431.
[11] Shin S J, Lee D S, Lee W S. CP-tree:an adaptive synopsis structure for compressing frequent itemsets over online data streams[J]. Information Sciences, 2014, 278:559-576.
[12] Chang J, Lee W. A sliding window method for finding recently frequent itemsets over online data streams[J]. Journal of Information Science and Engineering, 2004, 20(4):753-762.
[13] Li H F, Lee S Y. Mining frequent itemsets over data streams using efficient window sliding techniques[J]. Expert Systems with Applications, 2009, 36(2):1466-1477.
[14] Agrawal R, Srikant R. Fast algorithms for mining association rules[C]//Proceedings of the 20th International Conference on Very Large Data Bases, 1994:487-499.
[15] Farzanyar Z, Kangavari M, Cercone N. Max-FISM:mining (recently) maximal frequent itemsets over data streams using the sliding window model[J]. Computers and Mathematics with Applications, 2012, 64:1706-1718.
[16] Mohit O, Komal T. An enhanced apriori and improved algorithm for association rules[J]. International Research Journal of Engineering and Technology (IRJET), 2016, 3:2395-0072.
[17] 马月坤,刘鹏飞,张振友,孙燕,丁铁凡. 改进的FP-Growth算法及其分布式并行实现[J]. 哈尔滨理工大学学报,2016, 21(2):20-27. Ma Y K, Liu P F, Zhang Z Y, Sun Y, Ding T F. Improved FP-Growth Algorithm and Its Distributed Parallel Implementation[J]. Journal of Harbin University of Science and Technology, 2016, 21(2):20-27.
[18] Lin J C W, Gan W, Fournier V, Chao H C, Zhan J. Mining of frequent patterns with multiple minimum supports[J]. Engineering Applications of Artifical Intelligence, 2017, 60:83-96.
[19] Chi Y, Wang H, Yu P, Muntz R. MOMENT:maintaining closed frequent itemsets over a stream sliding window[C]//IEEE International Conference on Data Mining. Brighton, England, IEEE, 2006, 10(3):59-66.
[20] Jiang N, Gruenwald L. CFI-stream:mining closed frequent itemsets in data streams[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006:592-597.
[21] Li H F, Ho C C, Lee S Y. Incremental updates of closed frequent itemsets over continuous data streams[J]. Expert Systems with Applications, 2009, 36(2):2451-2458.
[22] Dai C, Chen L. An algorithm for mining frequent closed itemsets in data stream[J]. Physics Procedia, 2002, 24:1722-1728.
[23] Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation:a frequentpattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8:53-87.
[24] Nori F, Deypir M, Sadreddini M H. A sliding window-based algorithm for frequent closed itemset mining over data streams[J]. The Journal of Systems and Software, 2013, 86:615-623.