应用科学学报 ›› 2026, Vol. 44 ›› Issue (2): 250-265.doi: 10.3969/j.issn.0255-8297.2026.02.006

• 智能信息处理 • 上一篇    下一篇

基于复合三分支和深度特征编码的实时语义分割

雷晓春1,2, 潘奕伟1, 张永雅1, 江泽涛1,2, 李孟桐1   

  1. 1. 桂林电子科技大学 计算机与信息安全学院, 广西 桂林 541004;
    2. 桂林电子科技大学 广西图像图形与智能处理重点实验室, 广西 桂林 541004
  • 收稿日期:2024-12-02 发布日期:2026-04-07
  • 通信作者: 潘奕伟,研究方向为计算机视觉。E-mail:22032303079@mails.guet.edu.cn E-mail:22032303079@mails.guet.edu.cn
  • 基金资助:
    国家自然科学基金(No.62473105,No.62172118);广西自然学科基金重点项目(No.2021GXNSFDA196002);广西图像图形智能处理重点实验项目(No.GIIP2302,No.GIIP2303,GIIP2304,GIIP2305);桂林电子科技大学研究生教育创新计划项目(No.2024YCXS035)

Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding

LEI Xiaochun1,2, PAN Yiwei1, ZHANG Yongya1, JIANG Zetao1,2, LI Mengtong1   

  1. 1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China;
    2. Guangxi Key Laboratory of Image and Graphics Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China
  • Received:2024-12-02 Published:2026-04-07

摘要: 针对实时语义分割在物体大小差异显著的场景中,小物体分割错误和大物体分割结果出现空洞的问题,提出了一种基于复合三分支和深度特征编码的实时语义分割算法,由复合三分支模块(composite three-branch module,CTBM)、深度特征编码模块(deep feature encoding module,DFEM)和双分支多层感知机(dual-branch multi-layer perceptron,DBMLP)组成。CTBM通过双层多尺度特征提取和融合策略,从不同角度全面提取信息,使模型更好地感知特征间的全局关系,从而减少大物体分割结果中出现的空洞; DFEM通过编码方法提升模型对深层特征的表达能力,更好地感知小物体的语义信息,提升了小物体的分割精度; DBMLP同时利用全局和局部特征,有效地融合了多尺度语义信息,使分割结果边缘更平滑、轮廓更准确。在Cityscapes和ADE20K数据集上的评估结果显示,本文算法既满足了速度的实时需要,又分别以42.6 FPS和45.3 FPS实现了74.2%和40.4%的mIoU,明显优于其他实时语义分割算法。

关键词: 实时语义分割, 复合三分支, 深度特征编码, 双分支多层感知机

Abstract: To address the problems of small-object segmentation errors and holes in large-object segmentation results in scenes with significant differences in object sizes in real-time semantic segmentation, this paper proposed a real-time semantic segmentation algorithm based on a composite three-branch and deep feature encoding, consisting of a composite three-branch module (CTBM), a deep feature encoding module (DFEM), and a dual-branch multi-layer perceptron (DBMLP). The CTBM used a dual-layer multi-scale feature extraction and fusion strategy to comprehensively extract information from different perspectives, enabling the model to perceive the global relationships between features better and reduce the holes in the large-object segmentation results. The DFEM enhanced the model’ s ability to express deep features through encoding methods, better perceived the semantic information of small objects, and improved the segmentation accuracy of small objects. The DBMLP effectively integrated multi-scale semantic information by utilizing both global and local features, resulting in smoother edges and more accurate contours in segmentation results. Evaluation results on the Cityscapes and ADE20K datasets have shown that the algorithm not only meets real-time speed requirements but also achieves mIoU of 74.2% and 40.4% at 42.6 FPS and 45.3 FPS, respectively, significantly outperforming other real-time semantic segmentation algorithms.

Key words: real-time semantic segmentation, composite three-branch, deep feature encoding, dual-branch multi-layer perceptron

中图分类号: