改进的CRNN模型在警情文本分类中的研究与应用

doi:10.3969/j.issn.0255-8297.2020.03.005

应用科学学报 ›› 2020, Vol. 38 ›› Issue (3): 388-400.doi: 10.3969/j.issn.0255-8297.2020.03.005

改进的CRNN模型在警情文本分类中的研究与应用

王孟轩^1,2, 张胜^1,2, 王月^1,2, 雷霆^1,2, 杜渂^1,2

1. 电信科学技术第一研究所, 上海 200032;
2. 迪爱斯信息技术股份有限公司, 上海 200032

收稿日期:2019-09-01 出版日期:2020-05-31 发布日期:2020-06-11
通信作者: 杜渂,教授级高工,研究方向为大数据分析、机器学习、物联网和软件架构设计.E-mail:duwen@dscomm.com.cn E-mail:duwen@dscomm.com.cn
基金资助:
工业和信息化部2018年大数据产业发展试点项目基金；上海市信息化发展专项资金（No.201901043，No.201901003）；上海市人工智能创新发展专项基金（No.2018-RGZN-01013，No.2019-RGZN-01080）；上海市软件和集成电路产业发展专项资金（No.190234）资助

Research and Application of Improved CRNN Model in Classification of Alarm Texts

WANG Mengxuan^1,2, ZHANG Sheng^1,2, WANG Yue^1,2, LEI Ting^1,2, DU Wen^1,2

1. First Institute of telecommunications technology, Shanghai 200032, China;
2. DS Information Technology Co., Ltd., Shanghai 200032, China

Received:2019-09-01 Online:2020-05-31 Published:2020-06-11

摘要/Abstract

摘要： 针对某市公安110接处警文本描述进行案件分类的需求，参考现有文本分类方法在其他行业的应用，搭建了应用于警情描述的文本分类系统.通过论证常见分类网络适用场合及其优缺点，结合对警情数据中案件描述特征的分析，提出了基于改进卷积循环神经网络的模型，该模型优化了关键特征提取过程，弥补了现有模型短文本局部特征提取不足的缺陷.实验表明，该模型的准确率比常见分类模型提升了2%~3%，且能够有效保证数据局部特征的关联性，可以对案件描述所对应的案件类型进行准确分类，从而提高公安接处警平台的自动化效率.

关键词: 警情文本处理, 文本分类, 卷积神经网络, 双向长短时记忆, SelfAttention

Abstract: Aiming at classifying the police text descriptions of city’s public security for police stations, this paper builds a text classification of police descriptions based on the existing text classification methods used in other industries. By demonstrating the applicable occasions of common classification networks and their advantages and disadvantages, and combining with the text characteristics of the police case description data, a network structure based on Improved convolutional reccurrent neural network (CRNN) is proposed. The proposed structure provides an optimization key feature extraction process to make up the insufficiency of the existing model in the extraction of short-text feature. Through the comparison test between the proposed model and the existing common classification model, the proposed model not only shows an improved classification accuracy, 2%～3% higher than the existing model, but also provides effective guarantee on the relevance of local features of the data. The model can achieve accurate type classification of police descriptions, thus improving the automation efficiency of the police station.

Key words: alarm text processing, text classification, conventional neural network(CNN), bi-directional long short-term memory (BiLSTM), SelfAttention

中图分类号:

P751.1

王孟轩, 张胜, 王月, 雷霆, 杜渂. 改进的CRNN模型在警情文本分类中的研究与应用[J]. 应用科学学报, 2020, 38(3): 388-400.

WANG Mengxuan, ZHANG Sheng, WANG Yue, LEI Ting, DU Wen. Research and Application of Improved CRNN Model in Classification of Alarm Texts[J]. Journal of Applied Sciences, 2020, 38(3): 388-400.

参考文献

[1] 李荣艳,金鑫,王春辉,等.一种新的中文文本分类算法[J].北京师范大学学报(自然科学版),2006(5):501-505. Li R Y, Jin X, Wang C H, et al. A new algorithm for Chinese text classification[J]. Journal of Beijing Normal University (Natural Science Edition), 2006(5):501-505.(in Chinese)
[2] Peng F, Schuurmans D. Combining naive Bayes and n-gram language models for text classification[C]//European Conference on Information Retrieval. Springer, Berlin, Heidelberg, 2003:335-350.
[3] 翟林,刘亚军.支持向量机的中文文本分类研究[J].计算机与数字工程,2005(3):22-24, 46. Zhai L, Liu Y J. Research on Chinese text classification of support vector machine[J]. Computer and Digital Engineering, 2005(3):22-24, 46.(in Chinese)
[4] 刘月,翟东海,任庆宁.基于注意力CNLSTM模型的新闻文本分类[J].计算机工程,2019,45(7):303-308, 314. Liu Y, Zhai D H, Ren Q N. News text classification based on attentional CNLSTM model[J]. Computer Engineering, 2019, 45(7):303-308, 314.(in Chinese)
[5] Yoon K. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014:1746-1751.
[6] 卢健,马成贤,杨腾飞,等. Text-CRNN+Attention架构下的多类别文本信息分类[J].计算机应用研究,37(6):1-6. Lu J, Ma C X, Yang T F, et al. Multi-category text information classification under TextCRNN+Attention framework[J]. Computer Application Research, 37(6):1-6.(in Chinese)
[7] Chung J, Gulcehre C, Cho K, et al. Supplementary material:gated feedback recurrent neural networks[C]//Processions of International Conference on Machine Learning. 2015:2067-2075.
[8] Chorowski J K, Bahdanau D, Serdyuk D, et al. Attention-based models for speech recognition[C]//Advances in Neural Information Processing Systems, Montreal, Canada, 2015:577-585.
[9] 凡子威,张民,李正华.基于BiLSTM并结合自注意力机制和句法信息的隐式篇章关系分类[J].计算机科学,2019, 46(5):214-220. Fan Z F, Zhang M, Li Z H. Classification of implicit discourse relations based on BiLSTM combined with self-attention mechanism and syntactic information[J]. Computer Science, 2019, 46(5):214-220.(in Chinese)
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, California, USA, 2017:5998-6008.
[11] 王根生,黄学坚.基于Word2vec和改进型TF-IDF的卷积神经网络文本分类模型[J].小型微型计算机系统,2019, 40(5):1120-1126. Wang G S, Huang X J. Text classification model of convolutional neural network based on Word2vec and improved TF-IDF[J]. Miniature Microcomputer System, 2019, 40(5):1120-1126.(in Chinese)
[12] 王吉俐,彭敦陆,陈章,等. AM-CNN:一种基于注意力的卷积神经网络文本分类模型[J].小型微型计算机系统,2019, 40(4):710-714. Wang J L, Peng D L, Chen Z, et al. AM-CNN:a text classification model of attention-based convolutional neural network[J]. Miniature Microcomputer System, 2019, 40(4):710-714.(in Chinese)
[13] Zheng H, Chen M, Liu W, et al. Improving deep neural networks by using sparse dropout strategy[C]//2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP), Xi'an, China, 2014:21-26.
[14] Wahlbeck K, Tuunainen A, Ahokas A, et al. Dropout rates in randomised antipsychotic drug trials[J]. Psychopharmacology, 2001, 155(3):230-233.
[15] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J/OL]. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, United States. 2016[2019-09-16]. https://arxiv.org/abs/1409.0473
[16] Ioffe S, Szegedy C. Batch normalization:accelerating deep network training by reducing internal covariae shift[C]//International Conference on Machine Learning, Lille, France, 2015:448-456.
[17] Mohammand A H, Alwadán T, Al-Momani O. Arabic text categorization using support vector machine, Naïve Bayes and neural network[J]. GSTF Journal on Computing (JoC), 2016, 5(1):108.
[18] Devlin J, Chang M W, Lee K, et al. Bert:Pre-training of deep bidirectional transformers for language understanding[DB/OL]. 2018[2019-09-16]. https://arxiv.org/abs/1810.04805.
[19] Pappas N, Popescu-Belis A. Multilingual hierarchical attention networks for document classification[DB/OL]. 2017[2019-09-01]. https://arxiv.org/abs/1707.00896.

改进的CRNN模型在警情文本分类中的研究与应用

Research and Application of Improved CRNN Model in Classification of Alarm Texts

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价

[1]	马鑫, 吴云, 鹿泽光. 基于混合神经网络的协同过滤推荐模型[J]. 应用科学学报, 2020, 38(3): 478-487.
[2]	刘伟, 章琬苓, 项世军. 基于LBP-MDCT和CNN的人脸活体检测算法[J]. 应用科学学报, 2019, 37(5): 609-617.
[3]	王灿军, 廖鑫, 陈嘉欣, 秦拯, 刘绪崇. 基于卷积神经网络的面部图像修饰检测[J]. 应用科学学报, 2019, 37(5): 618-630.
[4]	吴韵清, 吴鹏, 陈北京, 鞠兴旺, 高野. 基于残差全卷积网络的图像拼接定位算法[J]. 应用科学学报, 2019, 37(5): 651-662.
[5]	靳华中, 刘潇龙, 胡梓珂. 一种结合全局和局部特征的图像描述生成模型[J]. 应用科学学报, 2019, 37(4): 501-509.
[6]	赵云山, 段友祥. 基于Attention机制的卷积神经网络文本分类模型[J]. 应用科学学报, 2019, 37(4): 541-550.
[7]	冯勇, 屈渤浩, 徐红艳, 王嵘冰, 张永刚. 融合TF-IDF和LDA的中文FastText短文本分类方法[J]. 应用科学学报, 2019, 37(3): 378-388.
[8]	曾润华, 张树群. 改进卷积神经网络的语音情感识别方法[J]. 应用科学学报, 2018, 36(5): 837-844.
[9]	杨滨, 张涛, 陈先意. 基于深度学习的图像局部模糊识别[J]. 应用科学学报, 2018, 36(2): 321-330.
[10]	史晓裕, 李斌, 谭舜泉. 深度学习空域隐写分析的预处理层[J]. 应用科学学报, 2018, 36(2): 309-320.
[11]	董伟, 王建军. 改进的卷积神经网络用于对比度增强取证[J]. 应用科学学报, 2017, 35(6): 745-753.
[12]	丁泽亚1,2，张全1. 利用概念知识的文本分类[J]. 应用科学学报, 2013, 31(2): 197-203.
[13]	刘海峰;姚泽清;刘守生;王倩 . 文本分类中基于核的非线性判别[J]. 应用科学学报, 2008, 26(6): 627-631 .
[14]	忻健, 陆巍, 朱景德, 王翼飞. GenExtractor:一个基于Web的生物信息挖掘系统[J]. 应用科学学报, 2005, 23(1): 75-81.