应用科学学报 ›› 2003, Vol. 21 ›› Issue (3): 284-288.

• 论文 • 上一篇    下一篇

图书馆大规模日志数据的快速聚类

宋爱波, 庄晓青, 何洁月, 业宁, 董逸生   

  1. 东南大学计算机科学与工程系 江苏 南京 210096
  • 收稿日期:2002-07-08 修回日期:2003-02-18 出版日期:2003-09-10 发布日期:2003-09-10
  • 作者简介:宋爱波(1970-),男,山东烟台人,博士生;董逸生(1940-),男,江苏启东人,教授,博导.
  • 基金资助:
    江苏省十五高科技基金资助项目(BG2001013)

Quick Clustering of Large Log Data in Library

SONG Ai-bo, ZHUANG Xiao-qing, HE Jie-yue, YE Ning, DONG Yi-sheng   

  1. Department of Computer Science & Engineering, Southeast University, Nanjing 210096, China
  • Received:2002-07-08 Revised:2003-02-18 Online:2003-09-10 Published:2003-09-10

摘要: 给出了一种简单有效的算法,用于对图书馆大规模日志数据的快速聚类和借阅趋势分析.首先,根据读者基本的借阅规律,把日志数据聚成若干子类,然后,给出了对子类结果进行再次聚类的模糊算法.最后,为了预测读者的借阅趋势,对每一类进行了回归分析.该算法具有线性复杂度,对大规模数据集是可伸缩的,实验证明是可行的.

关键词: 日志数据, 聚类, 回归分析, 数字图书馆

Abstract: In this paper, a simple and efficient method is presented for quick clustering and trend analyzing of library large log data. First, log data is clustered into a number of subclasses based on the underlying regularity of reader's borrowing and returning books. Then a fuzzy clustering algorithm is given for clustering the subclasses. The time complexity is linear, so our method can scale to large dataset. Finally, regression analysis is performed on the each cluster in order to dis cover the trend of borrowing and returning books. The experiment shows that this approach is successful.

Key words: digital library, clustering, regression analysis, log data

中图分类号: