应用科学学报 ›› 2025, Vol. 43 ›› Issue (3): 437-450.doi: 10.3969/j.issn.0255-8297.2025.03.006

• 计算机科学与应用 • 上一篇    

基于对比学习的半监督加密流量分类模型

金彦亮1,2, 方洁1,2, 高塬1,2, 周嘉豪1,2   

  1. 1. 上海大学 通信与信息工程学院, 上海 200444;
    2. 上海大学 上海先进通信与数据科学研究院, 上海 200444
  • 收稿日期:2023-07-04 发布日期:2025-06-23
  • 通信作者: 金彦亮,副教授,研究方向为大数据与网络安全、人工智能。E-mail:jinyanliang@staff.shu.edu.cn E-mail:jinyanliang@staff.shu.edu.cn
  • 基金资助:
    上海市科委项目(No.XTCK-KJ-2022-68,No.22N51900200);上海市自然科学基金(No.22511103202)

Semi-supervised Encrypted Traffic Classification Model Based on Contrastive Learning

JIN Yanliang1,2, FANG Jie1,2, GAO Yuan1,2, ZHOU Jiahao1,2   

  1. 1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China;
    2. Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China
  • Received:2023-07-04 Published:2025-06-23

摘要: 针对大多数加密流量分类(encrypted traffic classification,ETC)模型由于标签数据稀缺而导致的性能下降问题,提出了一个基于对比学习的半监督加密流量分类(semisupervised encrypted traffic classification based on contrastive learning,SSETC-CL)模型。通过比较样本之间的相似性和差异性,SSETC-CL模型能够从大量无标注数据中学习到有用的表示,从而获得一个通用且优秀的特征编码网络,降低了下游任务对标签数据的依赖。本文在公有数据集ISCXVPN2016以及两个自采数据集上对SSETC-CL模型进行了评估。与其他基准模型相比,SSETC-CL模型在设定任务上的表现最佳,准确率最大提升了8.92%。实验结果表明,SSETC-CL模型不仅在预训练模型已知的流量上具有较高的精度,而且具备将预训练模型所获得的知识应用于未知流量的迁移能力。

关键词: 加密流量分类, 对比学习, 半监督, 数据增强, 迁移学习

Abstract: To address the performance degradation of most encrypted traffic classification (ETC) models due to scarce labeled data, this paper proposes a semi-supervised encrypted traffic classification model based on contrastive learning (SSETC-CL). By comparing the similarities and differences between samples, SSETC-CL is capable of learning useful representations from large amounts of unlabeled data, thereby obtaining a versatile and effective feature encoding network, and reducing dependence on labeled data for downstream tasks. The performance of SSETC-CL is evaluated on the public dataset ISCXVPN2016 as well as two self-collected datasets. Compared to other baseline models, SSETC-CL achieved a maximum accuracy improvement of 8.92% on the specified task, showing its superior performance. Experimental results clearly demonstrate that SSETC-CL not only achieves high accuracy on traffic seen during pretraining but also exhibits the ability to transfer the knowledge gained from pretraining to unknown traffic.

Key words: encrypted traffic classification (ETC), contrastive learning, semi-supervised, data augmentation, transfer learning

中图分类号: