应用科学学报 ›› 2023, Vol. 41 ›› Issue (5): 870-880.doi: 10.3969/j.issn.0255-8297.2023.05.012

• 计算机科学与应用 • 上一篇    

融合Skip-gram与R-SOPMI的教育领域情感词典构建

陈俊, 席宁丽, 李佳敏, 万晓容   

  1. 贵州师范大学 教育学院, 贵州 贵阳 550025
  • 收稿日期:2022-07-19 发布日期:2023-09-28
  • 通信作者: 席宁丽,研究方向为自然语言处理中的短文本情感分析。E-mail:1079665637@qq.com E-mail:1079665637@qq.com
  • 基金资助:
    贵州省高校人文社会科学研究项目(No.2023GZGXRW146)资助

Constructing Sentiment Lexicon in the Education Field by Integrating Skip-Gram and R-SOPMI

CHEN Jun, XI Ningli, LI Jiamin, WAN Xiaorong   

  1. School of Education, Guizhou Normal University, Guiyang 550025, Guizhou, China
  • Received:2022-07-19 Published:2023-09-28

摘要: 提出一种基于特征融合的细粒度教育领域情感词典构建方法。首先构建了教育领域语料库,包含正式、非正式领域情绪特征;其次提出一种融合特征的领域情绪词典构建方法,在情绪划分基础上识别词的语言概率特征以及统计概率特征,改进情感倾向点互信息,提出用于情绪分类的情感倾向点互信息算法,实现共现多分类情绪划分;最后得到细粒度教育领域情感词典,词典扩充至39 138个情绪词。实验表明:使用所提出方法构建的教育领域情绪词典除情绪“怒”以外,各类别F1综合指标均高于78.09%,整体性能良好。与通用词典相比,宏平均准确率、宏召回率和宏F1分别提升了21.95%、2.50%和13.01%,表明该融合特征方法能有效提取领域特征进而完成细粒度领域词典构建。

关键词: 情感词典, 情绪分类, 词向量, 融合特征

Abstract: This paper presents a method for constructing a fine-grained Sentiment Lexicon in Education to address specific emotional issues in sentiment analysis of educational feedback texts. First, we construct an educational domain corpus, which contains emotional features in both formal and informal domains. Second, a fusion-based method is proposed to construct a domain Sentiment Lexicon by identifying linguistic probability features and statistical probability features of words through sentiment classification. The proposed repetitive semantic orientation pointwise mutual information (R-SOPMI) algorithm enhances SO-PMI for sentiment classification, enabling co-occurrence multi-category sentiment classification. Finally, a fine-grained Sentiment Lexicon in the field of education is obtained, and the dictionary expands to 39 138 emotional words. Experiment results show that except for “anger”, the F1 of the emotion category of the constructed educational field emotion dictionary is all higher than 78.09%. Compared with a general dictionary, the Macro_Precision, Macro_Recall and Macro_F1 increased by 21.95%, 2.50% and 13.01%, respectively. The fusion feature method effectively extracts domain features, facilitating the construction of a comprehensive fine-grained domain dictionary.

Key words: Sentiment Lexicon, sentiment classification, Word2vec, fusion features

中图分类号: