应用科学学报 ›› 2024, Vol. 42 ›› Issue (5): 810-822.doi: 10.3969/j.issn.0255-8297.2024.05.008

• 计算机科学与应用 • 上一篇    

融合标题情感和话题特征的新闻推荐算法

艾均, 洪星琦   

  1. 上海理工大学 光电信息与计算机工程学院, 上海 200093
  • 收稿日期:2022-07-14 发布日期:2024-09-29
  • 通信作者: 艾均,副教授,研究方向为推荐系统、复杂网络。E-mail:aijun@usst.edu.cn E-mail:aijun@usst.edu.cn
  • 基金资助:
    国家自然科学基金(No.61803264)资助

News Recommendation Algorithm Incorporating Headline Sentiment and Topic Characteristics

AI Jun, HONG Xingqi   

  1. School of Optical-Electrical Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2022-07-14 Published:2024-09-29

摘要: 文本是新闻最主要的媒介,传统基于情感词典的新闻推荐算法在分析情感词汇时,通常会忽略词典外的词汇情感,使得情感词汇标记不全,导致预测准确度不高和排序性能不佳等问题。针对这些问题,提出了一种推断未知词汇情感的启发式方法,设计了一种对应的新闻推荐算法来验证其有效性。构建标题-情感词-情感字三部图模型,将情感词典中的词汇情感扩散到单个的字,通过情感词和情感字得到了标题情感。首先,用词袋模型提取出标题的话题特征。然后,计算标题之间的情感相似度和话题相似度,并将两种相似度融合作为综合的相似度评价指标。接着,选取与目标新闻相似度较高的新闻作为邻居。算法通过邻居新闻的时均点击量,预测出目标新闻的时均点击量,将时均点击量视为目标新闻的预测评分,最终将评分排序实现对用户的新闻推荐。在真实的网易热榜新闻数据集上验证了该方法的可行性与有效性。对比其他算法,本文算法的平均绝对误差最优准确度提升了2.2%~3.4%,均方根误差最优准确度提升了2.3%~2.9%,归一化折损累计增益平均得分提升了0.7%~1.8%。

关键词: 推荐系统, 情感分析, 词袋, 协同过滤, 三部图

Abstract: Traditional lexicon-based news recommendation algorithms often ignore the emotional nuances present in words beyond the confines of the dictionary. This oversight can lead to issues such as diminished prediction accuracy and subpar sorting performance. To address these challenges, this paper introduces a heuristic approach to deduce the sentiment of unfamiliar words and devises a news recommendation algorithm to verify its feasibility. A tripartite graph model is constructed to propagate sentiment from a sentiment dictionary to individual words and obtain the headline sentiment. In addition, the bag-of-words model is used to extract topic features from the headlines. The sentiment similarity and topic similarity between headlines are calculated, consolidating these into a comprehensive similarity evaluation index. The news with higher similarity to the target news is then selected as the neighbor. The algorithm predicts the hourly average click volume of the target news by considering the hourly average click volume of neighbors, treating this as the predicted score for the target news. Finally, users receive a selection of high-scoring news articles. Validation using real data from NetEase News confirms the feasibility and effectiveness of our algorithm. Compared with other algorithms, our algorithm has shown improvements in the optimal accuracy of mean absolute error (MAE) by 2.2% to 3.4%, root mean square error (RMSE) by 2.3% to 2.9%, and the mean score of normalized discounted cumulative gain (NDCG) by 0.7% to 1.8%, respectively.

Key words: recommender system, sentiment analysis, bag-of-words, collaborative filtering, tripartite graph

中图分类号: