应用科学学报 ›› 2021, Vol. 39 ›› Issue (3): 443-442.doi: 10.3969/j.issn.0255-8297.2021.03.010

• 信号与信息处理 • 上一篇    

基于改进的MMR算法的新闻文本抽取式摘要方法

程琨, 李传艺, 贾欣欣, 葛季栋, 骆斌   

  1. 南京大学 软件学院, 江苏 南京 210093
  • 收稿日期:2020-10-26 发布日期:2021-06-08
  • 通信作者: 李传艺,博士,研究方向为自然语言处理和机器学习。E-mail:lcy@nju.edu.cn E-mail:lcy@nju.edu.cn

News Summarization Extracting Method Based on Improved MMR Algorithm

CHENG Kun, LI Chuanyi, JIA Xinxin, GE Jidong, LUO Bin   

  1. Software Institute, Nanjing University, Nanjing 210093, Jiangsu, China
  • Received:2020-10-26 Published:2021-06-08

摘要: 提出了基于最大边缘相关(maximal marginal relevance,MMR)的新闻摘要方法以及基于支持向量机(support vector machine,SVM)和MMR相结合的新闻摘要方法。其中,第1种方法是对传统MMR模型进行了改进,第2种方法使用了改进MMR模型对SVM分类结果进行了二次选择。实验表明:相比于传统MMR模型,该文提出的基于改进MMR的摘要方法和基于SVM-MMR的摘要方法的平均准确率分别提升了0.148、0.204,且基于MMR的新闻摘要方法的摘要效率约为基于SVM-MMR的摘要方法的3倍。改进的MMR算法更加适用于对摘要效率要求高的应用场景,特别是对长文本进行摘要。基于SVM-MMR的摘要方法则更适用于生成对文本内容覆盖相对全面的摘要。

关键词: 新闻摘要, 抽取式摘要, 冗余处理, 支持向量机, 最大边缘相关

Abstract: This paper proposes a news extraction method based on maximal marginal relevance (MMR) and a news extraction method based on support vector machine and maximal marginal relevance (SVM-MMR). The first method improves the traditional MMR news extraction method, and the second one uses the improved MMR news extraction method to make a second choice of the SVM classification results. Compared with the traditional MMR news extraction method, the average precision of MMR-based and SVMMMR-based news extraction methods are improved by 0.148 and 0.204, respectively. And the extraction efficiency of the MMR-based method is about 3 times of that of the SVMMMR method. The augmented MMR algorithm is more suitable for application scenarios that require high summarization efficiency, especially for long text summarization, while the SVM-MMR method is more suitable for generating a more comprehensive summary of the text content.

Key words: news extraction, extractive summarization, redundant processing, support vector machine (SVM), maximal marginal relevance (MMR)

中图分类号: