基于复合三分支和深度特征编码的实时语义分割

doi:10.3969/j.issn.0255-8297.2026.02.006

应用科学学报 ›› 2026, Vol. 44 ›› Issue (2): 250-265.doi: 10.3969/j.issn.0255-8297.2026.02.006

基于复合三分支和深度特征编码的实时语义分割

雷晓春^1,2, 潘奕伟¹, 张永雅¹, 江泽涛^1,2, 李孟桐¹

1. 桂林电子科技大学计算机与信息安全学院, 广西桂林 541004;
2. 桂林电子科技大学广西图像图形与智能处理重点实验室, 广西桂林 541004

收稿日期:2024-12-02 发布日期:2026-04-07
通信作者: 潘奕伟，研究方向为计算机视觉。E-mail:22032303079@mails.guet.edu.cn E-mail:22032303079@mails.guet.edu.cn
基金资助:
国家自然科学基金（No.62473105,No.62172118）；广西自然学科基金重点项目（No.2021GXNSFDA196002)；广西图像图形智能处理重点实验项目（No.GIIP2302,No.GIIP2303,GIIP2304,GIIP2305）；桂林电子科技大学研究生教育创新计划项目（No.2024YCXS035）

Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding

LEI Xiaochun^1,2, PAN Yiwei¹, ZHANG Yongya¹, JIANG Zetao^1,2, LI Mengtong¹

1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China;
2. Guangxi Key Laboratory of Image and Graphics Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China

Received:2024-12-02 Published:2026-04-07

摘要/Abstract

摘要： 针对实时语义分割在物体大小差异显著的场景中，小物体分割错误和大物体分割结果出现空洞的问题，提出了一种基于复合三分支和深度特征编码的实时语义分割算法，由复合三分支模块（composite three-branch module,CTBM）、深度特征编码模块（deep feature encoding module,DFEM）和双分支多层感知机（dual-branch multi-layer perceptron,DBMLP）组成。CTBM通过双层多尺度特征提取和融合策略，从不同角度全面提取信息，使模型更好地感知特征间的全局关系，从而减少大物体分割结果中出现的空洞； DFEM通过编码方法提升模型对深层特征的表达能力，更好地感知小物体的语义信息，提升了小物体的分割精度； DBMLP同时利用全局和局部特征，有效地融合了多尺度语义信息，使分割结果边缘更平滑、轮廓更准确。在Cityscapes和ADE20K数据集上的评估结果显示，本文算法既满足了速度的实时需要，又分别以42.6 FPS和45.3 FPS实现了74.2%和40.4%的mIoU，明显优于其他实时语义分割算法。

关键词: 实时语义分割, 复合三分支, 深度特征编码, 双分支多层感知机

Abstract: To address the problems of small-object segmentation errors and holes in large-object segmentation results in scenes with significant differences in object sizes in real-time semantic segmentation, this paper proposed a real-time semantic segmentation algorithm based on a composite three-branch and deep feature encoding, consisting of a composite three-branch module (CTBM), a deep feature encoding module (DFEM), and a dual-branch multi-layer perceptron (DBMLP). The CTBM used a dual-layer multi-scale feature extraction and fusion strategy to comprehensively extract information from different perspectives, enabling the model to perceive the global relationships between features better and reduce the holes in the large-object segmentation results. The DFEM enhanced the model’ s ability to express deep features through encoding methods, better perceived the semantic information of small objects, and improved the segmentation accuracy of small objects. The DBMLP effectively integrated multi-scale semantic information by utilizing both global and local features, resulting in smoother edges and more accurate contours in segmentation results. Evaluation results on the Cityscapes and ADE20K datasets have shown that the algorithm not only meets real-time speed requirements but also achieves mIoU of 74.2% and 40.4% at 42.6 FPS and 45.3 FPS, respectively, significantly outperforming other real-time semantic segmentation algorithms.

Key words: real-time semantic segmentation, composite three-branch, deep feature encoding, dual-branch multi-layer perceptron

中图分类号:

TP391.41

雷晓春, 潘奕伟, 张永雅, 江泽涛, 李孟桐. 基于复合三分支和深度特征编码的实时语义分割[J]. 应用科学学报, 2026, 44(2): 250-265.

LEI Xiaochun, PAN Yiwei, ZHANG Yongya, JIANG Zetao, LI Mengtong. Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding[J]. Journal of Applied Sciences, 2026, 44(2): 250-265.

参考文献

[1] 栗风永, 叶彬, 秦川. 基于奇偶交叉卷积的轻量级图像语义分割网络[J]. 应用科学学报, 2022, 40(3): 448-456. Li F Y, Ye B, Qin C. Lightweight image semantic segmentation network based on parity cross convolution [J]. Journal of Applied Sciences, 2022, 40(3): 448-456. (in Chinese)
[2] Xie E, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers [C]//Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[3] Zhang W, Huang Z, Luo G, et al. TopFormer: token pyramid transformer for mobile semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12083-12093.
[4] Yan H, Zhang C, Wu M. Lawin transformer: improving semantic segmentation transformer with multi-scale representations via large window attention [DB/OL]. (2023-08-09) [2024-12-02]. http://arxiv.org/abs/2201.01615.
[5] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[6] Paszke A, Chaurasia A, Kim S, et al. ENet: a deep neural network architecture for real-time semantic segmentation [DB/OL]. (2016-06-07) [2024-12-02]. http://arxiv.org/abs/1606.02147.
[7] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.
[8] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation [C]//European Conference on Computer Vision, 2018: 325-341.
[9] Yu C, Gao C, Wang J, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.
[10] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale [DB/OL]. (2021-06-03) [2024-12-02]. http://arxiv.org/abs/2010.11929.
[11] Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions [C]//IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[12] Liu Z, Lin Y, Cao Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[13] Zheng S X, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-tosequence perspective with transformers [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6881-6890.
[14] Xie G S, Liu J, Xiong H, et al. Scale-aware graph neural network for few-shot semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 5475-5484.
[15] Xie G S, Xiong H, Liu J, et al. Few-shot semantic segmentation with cyclic memory network [C]//IEEE/CVF International Conference on Computer Vision, 2021: 7293-7302.
[16] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]//European Conference on Computer Vision, 2018: 801-818.
[17] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703.
[18] Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7151-7160.
[19] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks [C]//IEEE/CVF International Conference on Computer Vision, 2017: 764-773.
[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 3431- 3440.
[21] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation [DB/OL]. (2017-12-05) [2024-12-02]. http://arxiv.org/abs/1706.05587.
[22] Gao G, Xu G, Li J, et al. FBSNet: a fast bilateral symmetrical network for real-time semantic segmentation [J]. IEEE Transactions on Multimedia, 2023, 25: 3273-3283.
[23] Wu B, Xiong X, Wang Y. Real-time semantic segmentation algorithm for street scenes based on attention mechanism and feature fusion [J]. Electronics, 2024, 13(18): 3699.
[24] Tu J, Chen G, Zhu H, et al. New depth-wise asymmetric bottleneck network with multi-scales for real-time semantic segmentation [C]//2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference, 2024: 511-516.

基于复合三分支和深度特征编码的实时语义分割

Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李永桢, 马涪元, 马世旋, 王钰涵, 王英. 基于结构增强和深度聚类的网络群体识别[J]. 应用科学学报, 2026, 44(1): 1-20.
[2]	金正洋, 阎少宏, 张艳博, 姚旭龙, 陶志刚, 陈志远. 融合空间纹理特征的三维模糊聚类算法[J]. 应用科学学报, 2026, 44(1): 134-148.
[3]	王金伟, 王海桦, 吴昊, 罗向阳, 马宾. 通过可迁移性差距提升对抗可迁移性[J]. 应用科学学报, 2025, 43(5): 799-807.
[4]	贺加贝, 周菊香, 甘健侯, 吴迪, 温晓宇. 基于多任务学习的课堂表情分类模型[J]. 应用科学学报, 2024, 42(6): 947-961.
[5]	栗莎, 王永雄, 王哲, 陈旭, 何嘉欣. 融合局部和全局特征的铸件缺陷检测[J]. 应用科学学报, 2024, 42(5): 757-768.
[6]	华怡坦, 黄影平, 过文昊. 基于CNN和Transformer点云图像融合的道路检测[J]. 应用科学学报, 2024, 42(4): 695-708.
[7]	崔帅华, 余磊, 何茜, 熊邦书, 欧巧凤. 一种大视场汇聚型双目立体视觉标定方法[J]. 应用科学学报, 2024, 42(2): 269-279.
[8]	熊娟, 张孙杰, 阚亚亚, 陈家豪. 基于CAFPN和细化双头解耦的遥感图像目标检测[J]. 应用科学学报, 2023, 41(6): 989-1003.
[9]	王辉, 丁铂栩. 三维点云表示的人体动作序列预测[J]. 应用科学学报, 2023, 41(3): 461-475.
[10]	萧晓彤, 丁建伟, 张琪. 基于局部和全局梯度上升的分段后门防御[J]. 应用科学学报, 2023, 41(2): 218-227.
[11]	徐增敏, 陆光建, 陈俊彦, 陈金龙, 丁勇. 基于通道特征聚合的行人重识别算法[J]. 应用科学学报, 2023, 41(1): 107-120.
[12]	邹倩颖, 陈晖阳, 李永生, 胡力雯, 王小芳. 粒子群优化的深海图像暗边缘检测优化算法[J]. 应用科学学报, 2023, 41(1): 153-169.
[13]	张育斌, 陈锋, 乐娟, 程起有. 直升机桨叶图像中圆形标记点圆心检测及修正方法[J]. 应用科学学报, 2022, 40(2): 212-223.
[14]	郑智文, 甘健侯, 周菊香, 欧阳昭相, 鹿泽光. 基于注意力网络推理图的细粒度图像分类[J]. 应用科学学报, 2022, 40(1): 36-46.
[15]	魏明军, 周太宇, 纪占林, 张鑫楠. 基于Mask-YOLO的复杂场景口罩佩戴检测[J]. 应用科学学报, 2022, 40(1): 93-104.