Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding

LEI Xiaochun; PAN Yiwei; ZHANG Yongya; JIANG Zetao; LI Mengtong

doi:10.3969/j.issn.0255-8297.2026.02.006

Journal of Applied Sciences >

2026 , Vol. 44 >Issue 2: 250 - 265

DOI: https://doi.org/10.3969/j.issn.0255-8297.2026.02.006

Intelligent Information Processing

Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding

LEI Xiaochun ,
PAN Yiwei ,
ZHANG Yongya ,
JIANG Zetao ,
LI Mengtong

Expand

1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China;
2. Guangxi Key Laboratory of Image and Graphics Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China

Received date: 2024-12-02

Online published: 2026-04-07

Fold

Abstract

To address the problems of small-object segmentation errors and holes in large-object segmentation results in scenes with significant differences in object sizes in real-time semantic segmentation, this paper proposed a real-time semantic segmentation algorithm based on a composite three-branch and deep feature encoding, consisting of a composite three-branch module (CTBM), a deep feature encoding module (DFEM), and a dual-branch multi-layer perceptron (DBMLP). The CTBM used a dual-layer multi-scale feature extraction and fusion strategy to comprehensively extract information from different perspectives, enabling the model to perceive the global relationships between features better and reduce the holes in the large-object segmentation results. The DFEM enhanced the model’ s ability to express deep features through encoding methods, better perceived the semantic information of small objects, and improved the segmentation accuracy of small objects. The DBMLP effectively integrated multi-scale semantic information by utilizing both global and local features, resulting in smoother edges and more accurate contours in segmentation results. Evaluation results on the Cityscapes and ADE20K datasets have shown that the algorithm not only meets real-time speed requirements but also achieves mIoU of 74.2% and 40.4% at 42.6 FPS and 45.3 FPS, respectively, significantly outperforming other real-time semantic segmentation algorithms.

Key words： real-time semantic segmentation; composite three-branch; deep feature encoding; dual-branch multi-layer perceptron

Cite this article

LEI Xiaochun , PAN Yiwei , ZHANG Yongya , JIANG Zetao , LI Mengtong . Real-Time Semantic Segmentation Based on Composite Three-Branch and Deep Feature Encoding[J]. Journal of Applied Sciences, 2026 , 44(2) : 250 -265 . DOI: 10.3969/j.issn.0255-8297.2026.02.006

References

[1] 栗风永, 叶彬, 秦川. 基于奇偶交叉卷积的轻量级图像语义分割网络[J]. 应用科学学报, 2022, 40(3): 448-456. Li F Y, Ye B, Qin C. Lightweight image semantic segmentation network based on parity cross convolution [J]. Journal of Applied Sciences, 2022, 40(3): 448-456. (in Chinese)
[2] Xie E, Wang W H, Yu Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers [C]//Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[3] Zhang W, Huang Z, Luo G, et al. TopFormer: token pyramid transformer for mobile semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12083-12093.
[4] Yan H, Zhang C, Wu M. Lawin transformer: improving semantic segmentation transformer with multi-scale representations via large window attention [DB/OL]. (2023-08-09) [2024-12-02]. http://arxiv.org/abs/2201.01615.
[5] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[6] Paszke A, Chaurasia A, Kim S, et al. ENet: a deep neural network architecture for real-time semantic segmentation [DB/OL]. (2016-06-07) [2024-12-02]. http://arxiv.org/abs/1606.02147.
[7] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.
[8] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation [C]//European Conference on Computer Vision, 2018: 325-341.
[9] Yu C, Gao C, Wang J, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.
[10] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale [DB/OL]. (2021-06-03) [2024-12-02]. http://arxiv.org/abs/2010.11929.
[11] Wang W, Xie E, Li X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions [C]//IEEE/CVF International Conference on Computer Vision, 2021: 568-578.
[12] Liu Z, Lin Y, Cao Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]//IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[13] Zheng S X, Lu J, Zhao H, et al. Rethinking semantic segmentation from a sequence-tosequence perspective with transformers [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 6881-6890.
[14] Xie G S, Liu J, Xiong H, et al. Scale-aware graph neural network for few-shot semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 5475-5484.
[15] Xie G S, Xiong H, Liu J, et al. Few-shot semantic segmentation with cyclic memory network [C]//IEEE/CVF International Conference on Computer Vision, 2021: 7293-7302.
[16] Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]//European Conference on Computer Vision, 2018: 801-818.
[17] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5693-5703.
[18] Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7151-7160.
[19] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks [C]//IEEE/CVF International Conference on Computer Vision, 2017: 764-773.
[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015: 3431- 3440.
[21] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation [DB/OL]. (2017-12-05) [2024-12-02]. http://arxiv.org/abs/1706.05587.
[22] Gao G, Xu G, Li J, et al. FBSNet: a fast bilateral symmetrical network for real-time semantic segmentation [J]. IEEE Transactions on Multimedia, 2023, 25: 3273-3283.
[23] Wu B, Xiong X, Wang Y. Real-time semantic segmentation algorithm for street scenes based on attention mechanism and feature fusion [J]. Electronics, 2024, 13(18): 3699.
[24] Tu J, Chen G, Zhu H, et al. New depth-wise asymmetric bottleneck network with multi-scales for real-time semantic segmentation [C]//2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference, 2024: 511-516.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References