融合胶囊网络与因果推理的疾病预测

doi:10.3969/j.issn.0255-8297.2025.01.001

摘要/Abstract

摘要： 现有基于深度学习的疾病预测模型通常是数据驱动的，导致模型过度依赖于训练数据集中的样本数量以及疾病类型覆盖范围。现有疾病预测方法主要存在以下局限性： 1）若模型在训练过程中所涉及的疾病类型有限，则其在处理罕见疾病时性能会大幅下降甚至做出错误预测。2）训练数据中可能存在与预测目标无关或相关性较小的特征。这种噪声会导致模型无法做出稳定的可靠预测，进而无法满足医疗领域应用对高安全性、高可靠性的现实需求。为解决上述问题，本文构建了融合胶囊网络与因果推理的疾病预测模型CausalCap。首先获取临床特征与疾病标签间的因果效应和因果关系，构建临床特征因果图。其次，对因果图进行剪枝，删除与疾病标签没有因果关系的虚假关联节点，保留对疾病发生有因果影响的关键节点，以得到疾病因果图。最后，利用分层图胶囊神经网络（hierarchical graph capsule network,HGCN）对疾病因果图进行图分类实现疾病预测。本文在6个公共数据集上进行的大量实验表明：与次优方法相比，所提方法在准确率和F1指标上分别有2.50%和6.46%的平均提升。

关键词: 疾病预测, 因果推理, 胶囊网络, 动态路由, 图分类

Abstract: Existing deep learning-based disease prediction models are predominantly datadriven, leading to a high dependency on the sample size and the coverage of disease types in the training dataset. The current methods for disease prediction have the following limitations: 1) When the model is trained on a limited range of disease types, its performance deteriorates significantly and may produce incorrect predictions for rare diseases. 2) The training data may contain features that are irrelevant or have weak correlations with the prediction target. This noise may prevent the model from making stable and reliable predictions, thus failing to meet the practical needs of high safety and reliability required in medical applications. To address these issues, this paper proposes a disease prediction model named CausalCap, which integrates capsule networks with causal inference. Specifically, we obtain the causal effects and relationships between clinical features and disease labels, and construct a causal graph of clinical features. The causal graph is then pruned to delete false nodes with no causal relationships to the disease labels, only retaining key nodes that truly influence the occurrence of the disease, resulting in a refined disease causal graph. Finally, hierarchical graph capsule neural network (HGCN) classifies the disease causal graph for disease prediction. Extensive evaluations on six public datasets demonstrate that CausalCap achieves an average improvement of 2.50% in ACC and 6.46% in F1 metrics compared to the suboptimal methods.

Key words: disease prediction, causal inference, capsule network, dynamic routing, graph classification

中图分类号:

TP391

孙明辰, 金辉, 王英. 融合胶囊网络与因果推理的疾病预测[J]. 应用科学学报, 2025, 43(1): 1-19.

SUN Mingchen, JIN Hui, WANG Ying. Disease Prediction via Capsule Network and Causal Reasoning[J]. Journal of Applied Sciences, 2025, 43(1): 1-19.

参考文献

[1] 王星, 刘晓燕. 医疗大数据环境下的疾病预测模型研究[J]. 制造业自动化, 2022, 44(7): 24-27. Wang X, Liu X Y. Research on disease prediction model in medical big data environment [J]. Manufacturing Automation, 2022, 44(7): 24-27. (in Chinese)
[2] 姚琼, 王觅也, 师庆科, 等. 深度学习在现代医疗领域中的应用[J]. 计算机系统应用, 2022, 31(4): 33-46. Yao Q, Wang M Y, Shi Q K, et al. Application of deep learning in modern healthcare [J]. Computer Systems & Applications, 2022, 31(4): 33-46. (in Chinese)
[3] Piccialli F, Di Somma V, Giampaolo F, et al. A survey on deep learning in medicine: why, how and when? [J]. Information Fusion, 2021, 66: 111-137.
[4] 徐亮, 阮晓雯, 李弦, 等. 人工智能在疾病预测中的应用[J]. 自然杂志, 2018, 40(5): 349-354. Xu L, Ruan X W, Li X, et al. Use of artificial intelligence in disease prediction [J]. Chinese Journal of Nature, 2018, 40(5): 349-354. (in Chinese)
[5] Smolen H J. Development of an influenza outbreak forecasting model using time series analysis methods [J]. Value in Health, 2014, 17(7): A561.
[6] Zhou J, Cui G Q, Hu S D, et al. Graph neural networks: a review of methods and applications [J]. AI Open, 2020, 1: 57-81.
[7] Liu Z Y, Zhou J. Introduction to graph neural networks [J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2020, 14(2): 1-127.
[8] Chen T, Bian S, Sun Y Z. Are powerful graph neural nets necessary? A dissection on graph classification [DB/OL]. 2019[2024-07-10]. http://arxiv.org/abs/1905.04579.
[9] Yan C C, Ding Q G, Zhao P L, et al. RetroXpert: decompose retrosynthesis prediction like a chemist [DB/OL]. 2020[2024-07-10]. http://arxiv.org/abs/2011.02893.
[10] Gilmer J, Schoenholz S S, Riley P F, et al. Neural message passing for quantum chemistry [DB/OL]. 2017[2024-07-10]. http://arxiv.org/abs/1704.01212.
[11] Ma H H, Bian Y T, Rong Y, et al. Multi-view graph neural networks for molecular property prediction [DB/OL]. 2020[2024-07-10]. http://arxiv.org/abs/2005.13607.
[12] Fout A, Byrd J, Shariat B, et al. Protein interface prediction using graph convolutional networks [EB/OL]. (2017-12-04) [2024-07-10]. https://dl.acm.org/doi/10.5555/3295222.3295399.
[13] Ying R, He R N, Chen K F, et al. Graph convolutional neural networks for web-scale recommender systems [C]//24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018: 974-983.
[14] Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering [DB/OL]. 2016[2024-07-10]. https://arxiv.org/abs/ 1606.09375.
[15] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks [DB/OL]. 2016[2024-07-10]. http://arxiv.org/abs/1609.02907.
[16] Atwood J, Towsley D. Diffusion-convolutional neural networks [DB/OL]. 2015[2024-07-10]. http://arxiv.org/abs/1511.02136.
[17] Cai T L, Luo S J, Xu K, et al. GraphNorm: a principled approach to accelerating graph neural network training [DB/OL]. 2020[2024-07-10]. http://arxiv.org/abs/2009.03294
[18] Chen H T, Wang Y H, Shu H, et al. Frequency domain compact 3D convolutional neural networks [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 1638-1647.
[19] Liang Y, Lu L Q, Xiao Q C, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 857-870.
[20] Pasa L, Navarin N, Sperduti A. SOM-based aggregation for graph convolutional neural networks [J]. Neural Computing and Applications, 2022, 34(1): 5-24.
[21] Vidaurre R, Santesteban I, Garces E, et al. Fully convolutional graph neural networks for parametric virtual try-on [J]. Computer Graphics Forum, 2020, 39(8): 145-156.
[22] Valsesia D, Fracastoro G, Magli E. Image denoising with graph-convolutional neural networks [C]//2019 IEEE International Conference on Image Processing, 2019: 2399-2403.
[23] Gao H C, Pei J, Huang H. Conditional random field enhanced graph convolutional neural networks [C]//25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019: 276-284.
[24] Bruna J, Zaremba W, Szlam A, et al. Spectral networks and locally connected networks on graphs [DB/OL]. 2013[2024-07-10]. http://arxiv.org/abs/1312.6203.
[25] Henaff M, Bruna J, Lecun Y. Deep convolutional networks on graph-structured data [DB/OL]. 2015[2024-07-10]. http://arxiv.org/abs/1506.05163.
[26] Wijesinghe A, Wang Q. DFNets: spectral CNNs for graphs with feedback-looped filters [DB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1910.10866v1.
[27] Verma S, Zhang Z L. Graph capsule convolutional neural networks [DB/OL]. 2018[2024-07- 10]. http://arxiv.org/abs/1805.08090.
[28] Velickovic P, Cucurull G, Casanova A, et al. Graph attention networks [DB/OL]. 2017[2024-07-10]. http://arxiv.org/abs/1710.10903.
[29] Xu K, Hu W H, Leskovec J, et al. How powerful are graph neural networks? [DB/OL]. 2018[2024-07-10]. http://arxiv.org/abs/1810.00826.
[30] Hamiltion W L, Ying R, Leskovec J. Inductive representation learning on large graphs [DB/OL]. 2017[2024-07-10]. http://arxiv.org/abs/1706.02216.
[31] 张月, 黄钢, 章小雷, 等. 贝叶斯网络在医学领域中的应用研究[J]. 中国医学创新, 2013, 10(4): 145-146. Zhang Y, Huang G, Zhang X L, et al. Study on application of Bayesian networks in the medical field [J]. Medical Innovation of China, 2013, 10(4): 145-146. (in Chinese)
[32] Hyvarinen A, Oja E. Independent component analysis: algorithms and applications [J]. Neural Networks, 2000, 13(4/5): 411-430.
[33] Zhen M M, Wang W M, Wang R G. Improving VLAD with regional PCA whitening [C]//2015 Visual Communications and Image Processing, 2015: 1-4.
[34] 山世光. 人脸识别中若干关键问题的研究[D]. 北京: 中国科学院研究生院（计算技术研究所）, 2004.
[35] 李倩玉, 蒋建国, 齐美彬. 基于改进深层网络的人脸识别算法[J]. 电子学报, 2017, 45(3): 619-625. Li Q Y, Jiang J G, Qi M B. Face recognition algorithm based on improved deep networks [J]. Acta Electronica Sinica, 2017, 45(3): 619-625. (in Chinese)
[36] Shimizu S, Hoyer P O, Hyvärinen A, et al. A linear non-Gaussian acyclic model for causal discovery [J]. Journal of Machine Learning Research, 2006, 7(4): 2003-2030.
[37] Rosenstrom T, Jokela M, Puttonen S, et al. Pairwise measures of causal direction in the epidemiology of sleep problems and depression [J]. PLoS One, 2012, 7(11): e50841.
[38] Helajarvi H, Rosenstrom T, Pahkala K, et al. Exploring causality between TV viewing and weight change in young and middle-aged adults. The cardiovascular risk in young finns study [J]. PLoS One, 2014, 9(7): e101860.
[39] Shimizu S, Inazumi T, Sogawa Y, et al. DirectLiNGAM: a direct method for learning a linear non-Gaussian structural equation model [J]. The Journal of Machine Learning Research, 2011, 12(2): 1225-1248.
[40] Ma S S, Statnikov A. Methods for computational causal discovery in biomedicine [J]. Behaviormetrika, 2017, 44(1): 165-191.
[41] Hinton G E, Krizhevsky A, Wang S D. Transforming auto-encoders [C]//International Conference on Artificial Neural Networks, 2011: 44-51.
[42] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules [DB/OL]. 2017[2024-07-10]. http://arxiv.org/abs/1710.09829.
[43] Hinton G E, Sabour S, Frosst N. Matrix capsules with EM routing [C]//International Conference on Learning Representations, 2018: 1-10.
[44] Xinyi Z, Chen L. Capsule graph neural network [C]//International Conference on Learning Representations, 2019: 1-16.