应用科学学报 ›› 2017, Vol. 35 ›› Issue (5): 634-646.doi: 10.3969/j.issn.0255-8297.2017.05.009

• 2016中国计算机应用大会遴选论文 • 上一篇    下一篇

基于DTS-ILDA模型和关联过滤的新闻话题演化分析

郭晓利1, 周自岚1, 刘耀伟2, 独健鸿3, 黄岩4   

  1. 1. 东北电力大学 信息工程学院, 吉林 吉林市 132012;
    2. 国网吉林省电力有限公司 吉林供电公司, 吉林 吉林市 132000;
    3. 吉林市丰满发电厂, 吉林 吉林市 132012;
    4. 中国移动通信集团吉林有限公司 吉林市分公司, 吉林 吉林市 132012
  • 收稿日期:2016-10-02 修回日期:2017-03-07 出版日期:2017-09-30 发布日期:2017-09-30
  • 作者简介:周自岚,硕士生,研究方向:文本信息处理、文本可视化,E-mail:1422076216@qq.com
  • 基金资助:

    国家自然科学基金(No.51277023);吉林省科技厅项目基金(No.20150307020GX)资助

Analysis of News Topic Evolution Based on DTS-ILDA Model and Association Filtering

GUO Xiao-li1, ZHOU Zi-lan1, LIU Yao-wei2, DU Jian-hong3, HUANG Yan4   

  1. 1. School of Information Engineering, Northeast Dianli University, Jilin 132012, Jilin Province, China;
    2. Jilin Power Supply Company, State Grid Jilin Province Electric Power Supply Company, Jilin 132000, Jilin Province, China;
    3. Jilin Fengman Power Plant, Jilin 132012, Jilin Province, China;
    4. Jilin Branch, China Mobile Communications Group Jilin Co., Ltd., Jilin 132012, Jilin Province, China
  • Received:2016-10-02 Revised:2017-03-07 Online:2017-09-30 Published:2017-09-30

摘要:

在话题演化跟踪领域,主题模型中时间片大小和主题数K值固定导致无法发掘重要时间转折点,为此提出一种动态时序分割无限潜在狄利克雷分配(dynamic temporalsegmentation-infnite latent Dirichlet allocation,DTS-ILDA)模型.对于演化分析中容易产生错误话题关联的问题,提出一种关联过滤机制.首先运用DTS-ILDA模型提取主题,将改进动态时间分割算法与无限潜在狄利克雷分配(infnite latent Dirichlet allocation,ILDA)模型进行融合.动态时间分割算法按时间顺序遍历数据集,根据列联表分析前后时间片主题分布情况以衡量分割效果,从而找到合适的时间片分割点;ILDA模型可在各时间片内提取不同数量话题并对提取出的主题进行演化关联分析,然后用关键过滤方法滤除关联性不强的关联关系,最后按照时间顺序关系为剩余的关联建立子话题的5种演化关系图.实验表明:该方法能有效找到主题内容发生重要变化的时间点,防止产生无意义话题,同时减少错误话题关联干扰,挖掘出准确的话题深层次关系.

关键词: 主题模型, 时间分割, 过滤, 主题演化, 无限潜在狄利克雷分配模型

Abstract:

In topic evolution and tracking, as the size of time slices and the K value of the topic model are fxed, it is hard to locate important time turning points, which is prone to error topic correlation in the evolutionary analysis. To solve the problem, we propose an improved dynamic temporal segmentation-infnite latent Dirichlet allocation (DTS-ILDA) model and an associated fltering mechanism. The model combines an improved dynamic time segmentation algorithm with an infnite latent Dirichlet allocation (ILDA) model to extract topics. Dynamic time segmentation algorithm traverses the data set according to the time sequence, and then uses a contingency table to analysis the distribution of topics to measure the segmentation results and an ILDA model to extract K topics. In addition, an association fltering mechanism is proposed for error prone association in the evolutionary analysis. It removes weak association relationship. Finally, fve evolutionary relationships of right subtopic association are established according to the time sequence relationship. Experiments show that the presented method can effectively fnd important time points when the main content of the topic changes, preventing generation of meaningless topics. It can also reduce error-topic related interference, extracting exact deep relationship between the topics.

Key words: topic model, topic evolution, a fltering, temporal segment, infnite latent Dirichlet allocation (ILDA) mode

中图分类号: