Analysis of News Topic Evolution Based on DTS-ILDA Model and Association Filtering
Received date: 2016-10-02
Revised date: 2017-03-07
Online published: 2017-09-30
In topic evolution and tracking, as the size of time slices and the K value of the topic model are fxed, it is hard to locate important time turning points, which is prone to error topic correlation in the evolutionary analysis. To solve the problem, we propose an improved dynamic temporal segmentation-infnite latent Dirichlet allocation (DTS-ILDA) model and an associated fltering mechanism. The model combines an improved dynamic time segmentation algorithm with an infnite latent Dirichlet allocation (ILDA) model to extract topics. Dynamic time segmentation algorithm traverses the data set according to the time sequence, and then uses a contingency table to analysis the distribution of topics to measure the segmentation results and an ILDA model to extract K topics. In addition, an association fltering mechanism is proposed for error prone association in the evolutionary analysis. It removes weak association relationship. Finally, fve evolutionary relationships of right subtopic association are established according to the time sequence relationship. Experiments show that the presented method can effectively fnd important time points when the main content of the topic changes, preventing generation of meaningless topics. It can also reduce error-topic related interference, extracting exact deep relationship between the topics.
GUO Xiao-li, ZHOU Zi-lan, LIU Yao-wei, DU Jian-hong, HUANG Yan . Analysis of News Topic Evolution Based on DTS-ILDA Model and Association Filtering[J]. Journal of Applied Sciences, 2017 , 35(5) : 634 -646 . DOI: 10.3969/j.issn.0255-8297.2017.05.009
[1] Cui W W, Liu S X, Tan L, Shi C L, Song Y Q, Gao Z J, Tong X, Qu H M. TextFlow:towards better understanding of evolving topics in text[J]. IEEE Transactions on Visualization & Computer Graphics, 2011, 17(12):2412-21.
[2] 曲朝阳,范旭东,曲楠,于华涛. 基于本体的智能电网文本知识获取模型[J]. 东北电力大学学报,2014, 34(5):60-68. Qu Z Y, Fan X D, Qu N, Yu H T. Smart grid text knowledge acquisition model based on ontology[J]. Journal of Northeast Dianli University, 2014, 34(5):60-68. (in Chinese)
[3] 曹丽娜,唐锡晋. 基于主题模型的BBS话题演化趋势分析[J]. 管理科学学报,2014, 17(11):109-121. Cao L N, Tang X J. Trends of BBS topics based on dynamic topic model[J]. Journal of Mangement Sciences in China, 2014, 17(11):109-121. (in Chinese)
[4] 洪宇,仓玉,朱巧明,姚建民,周国栋. 话题跟踪中静态和动态话题模型的核捕捉衰减[J]. 软件学报,2012, 23(5):1100-1119. Hong Y, Cang Y, Zhu Q M, Yao J M, Zhou G D. Descending kernel track of static and dynamic topic models in topic tracking[J]. Journal of Software, 2012, 23(5):1100-1119. (in Chinese)
[5] 徐戈,王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报,2011, 34(8):1423-1436. Xu G, Wang H F. The development of topic models in natural language processing[J]. Chinese Journal of Computers, 2011, 34(8):1423-1436. (in Chinese)
[6] 郭晓利,韩啸. 电网知识协同发现策略研究[J]. 东北电力大学学报,2014, 34(1):94-98. Guo X L, Han X. Grid knowledge collaborative discovery strategy research[J]. Journal of Northeast Dianli University, 2014, 34(1):94-98. (in Chinese)
[7] 杨玉珍,刘培玉,费绍栋,张成功. 融合扩展信息瓶颈理论的话题关联检测方法研究[J]. 自动化学报,2014, 40(3):471-479. Yang Y Z, Liu P Y, Fei S D, Zhang C G. A topic link detection method based on improved information bottleneck theory[J]. Acta Automatica Sinica, 2014, 40(3):471-479. (in Chinese)
[8] Hospedales T, Gong S, Xiang T. Video behavior mining using a dynamic topic model[J]. International Journal of Computer Vision, 2012, 98(3):303-323.
[9] Alsumait L, Barbar Daniel, Domeniconi C. On-line LDA:adaptive topic models for mining text streams with applications to topic detection and tracking[C]//IEEE International Conference on Data Mining. Pisa:IEEE, 2008:3-12.
[10] 胡艳丽,白亮,张维明. 一种话题演化建模与分析方法[J]. 自动化学报,2012, 38(10):1690-1697. Hu Y L, Bai L, Zhang W M. Modeling and analyzing topic evolution[J]. Acta Automatica Sinica, 2012, 38(10):1690-1697. (in Chinese)
[11] Wang X, Mccallum A. Topics over time:a non-Markov continuous-time model of topical trends[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia:ACM, 2006:424-433.
[12] Hall D, Jurafsky D, Manning C D. Studying the history of ideas using topic models[C]//Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. PA:ACM, 2008:363-371.
[13] 赵旭剑,李波,杨春明,张晖,金培权,岳丽华,戴文锴. 一种基于特征演变的新闻话题演化挖掘方法[J]. 计算机学报,2014, 37(4):819-832. Zhao X J, Li B, Yang C M, Zhang H, Jing P Q, Yue L H, Dai W K. A topic evolution mining algorithm of news text based on feature evolving[J]. Chinese Journal of Computers, 2014, 37(4):819-832. (in Chinese)
[14] Brody S, Elhadad N. An unsupervised aspect-sentiment model for online reviews[C]//Human Language Technologies:Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Los Angeles, California, USA. DBLP, 2013:804-812.
[15] Pan S M, Zhou M, Song Y Q, Qian W H, Wang F, Liu S X. Optimizing temporal topic segmentation for intelligent text visualization[C]//International Conference on Intelligent User Interfaces. Santa Monica:ACM, 2013:348-353.
[16] Gao Z, Song Y, Liu S, Wang H, Wei H, Chen Y, Cui W. Tracking and connecting topics via incremental hierarchical Dirichlet processes[C]//2011 IEEE 11th International Conference on Data Mining. Vancouver:IEEE, 2011:1056-1061.
[17] Gad S, Ramakrishnan N, Hampton K N, Kavanaugh A. Bridging the divide in democratic engagement:studying conversation patterns in advantaged and disadvantaged communities[C]//International Conference on Social Informatics. Alexandria:IEEE, 2012:165-176.
[18] 吕楠,罗军勇,刘尧,杨慧洁. 一种有效的事件演化分析算法[J]. 计算机应用研究,2009, 26(11):4101-4103. Lü N, Luo J Y, Liu Y, Yang H J. Effective event evolution analysis algorithm[J]. Application Research of Computers, 2009, 26(11):4101-4103. (in Chinese)
[19] 胡艳丽,白亮,张维明. 网络舆情中一种基于OLDA的在线话题演化方法[J]. 国防科技大学学报,2012, 34(1):150-154. Hu Y L, Bai L, Zhang W M. OLDA-based method for online topic evolution in network public opinion analysis[J]. Journal of National University of Defense Technology, 2012, 34(1):150-154. (in Chinese)
[20] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[21] Heinrich G. "Infnite LDA"-implementing the HDP with minimum code complexity[J]. Technical Note, 2011, 20(1):114-134.
[22] Teh Y W, Blei D M. Hierarchical Dirichlet processes[J]. Journal of the American Statistical Association, 2004, 101(467):1566-1581.
[23] Ding W, Chen C. Dynamic topic detection and tracking:a comparison of HDP, C-word, and cocitation methods[J]. Journal of the Association for Information Science & Technology, 2014, 65(10):2084-2097.
[24] 赵凡. 基于共词分析的学科主题动态跟踪相似算法改进研究[J]. 情报杂志,2010, 29(1):173-176. Zhao F. Research on similarity algorithm improvement of dynamic tracing disciplinary themes based on Co-word analysis[J]. Journal of Intelligence,2010, 29(1):173-176. (in Chinese)
[25] 李保利,杨星. 基于LDA模型和话题过滤的研究主题演化分析[J]. 小型微型计算机系,2012, 33(12):2738-2743. Li B L, Yang X. Analyzing research topic evolution with LDA and topic fltering[J]. Journal of Chinese Computer Systems, 2012, 33(12):2738-2743. (in Chinese)
[26] 曲朝阳,陈帅,杨帆,朱莉. 基于云计算技术的电力大数据预处理属性约简方法[J]. 电力系统自动化,2014, 38(8):67-71. Qu Z Y, Chen S, Yang F, Zhu L. An attribute reducing method for electric power big data preprocessing based on cloud computing technology[J]. Automation of Electric Power Systems, 2014, 38(8):67-71. (in Chinese)
/
| 〈 |
|
〉 |