文本分类中基于核的非线性判别

应用科学学报

文本分类中基于核的非线性判别

刘海峰 1, 姚泽清 1, 刘守生 1, 王倩 2

1. 解放军理工大学理学院，江苏南京 210007; 2. 徐州工程学院，江苏徐州 221116

收稿日期:2008-06-16 修回日期:2008-09-09 出版日期:2008-12-10 发布日期:2008-12-10
通信作者: 刘海峰

Kernel-Based Nonlinear Discriminant Method in Text Classification

LIU Hai-feng 1, YAO Ze-qing 1, LIU Shou-sheng 1, WANG Qian 2

1.Institute of Sciences, PLA University of Science and Technology, Nanjing 210007, China; 2.Institute of Xuzhou Engineering, Xuzhou 221116, China

Received:2008-06-16 Revised:2008-09-09 Online:2008-12-10 Published:2008-12-10
Contact: LIU Hai-feng

摘要/Abstract

摘要： 针对文本分类问题中的特征降维问题，改进最大散度差鉴别准则，引入核变换作为前处理，使最大散度差鉴别准则可适用于更广泛的文本分类情形. 提出一种基于核的非线性鉴别方法用于文本特征抽取. 借助于核变换解决了散度差准则在用于文本分类时线性可分性较差的问题. 在最低限度减少信息损失的前提下实现了特征维数的大幅度减缩. 文本分类试验结果表明，这种非线性方法与无核的最大散度差方法相比，F1值提高了4.7%, 具有明显的效率上的优势.

关键词: 文本分类, 特征抽取, 散度差, 核变换

Abstract: To achieve feature reduction in text categorization, the scatter difference criterion is improved to satisfy a broad range of text categorization problems using kernel commutation in the pre-treatment. A kernel-based nonlinear method is proposed to extract features. By kernel commutation, the stylebook categorization problem is solved with less linear separability. Dimension of the feature space is significantly reduced without incurring excessive information loss. Experiments show that performance of the proposed method is better than maximal scatter difference with an efficiency improvement of 4.7% for the value of F1.

Key words: text categorization, feature extraction, scatter difference, kernel commutation

中图分类号:

TP391

刘海峰;姚泽清;刘守生;王倩 . 文本分类中基于核的非线性判别[J]. 应用科学学报.

LIU Hai-feng;YAO Ze-qing;LIU Shou-sheng;WANG Qian . Kernel-Based Nonlinear Discriminant Method in Text Classification[J]. Journal of Applied Sciences.

[1]	王孟轩, 张胜, 王月, 雷霆, 杜渂. 改进的CRNN模型在警情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3): 388-400.
[2]	赵云山, 段友祥. 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019, 37(4): 541-550.
[3]	冯勇, 屈渤浩, 徐红艳, 王嵘冰, 张永刚. 融合TF-IDF和LDA的中文FastText短文本分类方法[J]. 应用科学学报, 2019, 37(3): 378-388.
[4]	丁泽亚1,2，张全1. 利用概念知识的文本分类[J]. 应用科学学报, 2013, 31(2): 197-203.
[5]	忻健, 陆巍, 朱景德, 王翼飞. GenExtractor:一个基于Web的生物信息挖掘系统[J]. 应用科学学报, 2005, 23(1): 75-81.