应用科学学报 ›› 2019, Vol. 37 ›› Issue (3): 327-335.doi: 10.3969/j.issn.0255-8297.2019.03.003

• 信号与信息处理 • 上一篇    下一篇

基于自然语言处理的蛋白质小分子亲和力值预测

欧阳志友1, 陈晨2, 王愉茜3, 陈金刚3, 殷昭4, 周青松5   

  1. 1. 南京邮电大学 先进技术研究院, 南京 210023;
    2. 南京邮电大学 计算机学院, 南京 210023;
    3. 南京邮电大学 经济学院, 南京 210023;
    4. 中国石油大学(华东)石油工程学院, 山东 青岛 266580;
    5. 重庆邮电大学 通信与信息工程学院, 重庆 400065
  • 收稿日期:2018-10-10 修回日期:2018-10-25 出版日期:2019-05-31 发布日期:2019-05-31
  • 作者简介:欧阳志友,博士生,研究方向:机器学习与电力大数据分析,E-mail:ouyang@njupt.edu.cn
  • 基金资助:
    国家自然科学基金(No.61533010)资助

Protein Small Molecule Affinity Prediction Based on Natural Language Processing

OUYANG Zhiyou1, CHEN Chen2, WANG Yuqian3, CHEN Jingang3, YIN Zhao4, ZHOU Qingsong5   

  1. 1. Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    2. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    3. School of Economics, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    4. School of Petroleum Engineering, China University of Petroleum, Qingdao 266580, Shandong Province, China;
    5. Department of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2018-10-10 Revised:2018-10-25 Online:2019-05-31 Published:2019-05-31

摘要: 蛋白质与小分子的相互作用研究对药物的研发非常重要,而现有的蛋白质小分子亲和力值的预测方法存在成本高、精度低等问题.为此提出了一种新的蛋白质小分子亲和力值的预测方法,利用自然语言处理技术对蛋白质结构数据与小分子指纹数据进行处理,并利用梯度提升决策树模型进行预测.实验表明,该方法的精度较原有方案有较大提高.

关键词: 自然语言处理, 梯度提升决策树, 蛋白质小分子亲和力值, 机器学习

Abstract: The interaction between proteins and small molecules plays a very important role in drug research and development. However, the existing methods for predicting the affinity of small molecules have some problems, such as high cost and low accuracy. In this paper, a new protein small molecule affinity prediction method is proposed based on natural language processing (NLP) technology, which using NLP to analysis the protein structure data and small molecule fingerprint data, as well as using gradient boosting decision tree (GBDT) model to predict the affinity. Experiments show that the proposed method has performance over the exiting methods in terms of accuracy.

Key words: natural language processing, machine learning, gradient boosting decision tree (GBDT), protein small molecule affinity value

中图分类号: