Special Issue on Computer Application

A Semantic Segmentation Network Based on Lightweight Convolutional Modules

Expand
  • College of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

Received date: 2024-07-18

  Online published: 2025-01-24

Abstract

Semantic simultaneous localization and mapping augmented with deep learning provides an effective solution for handling dynamic scenes. However, this technology still faces challenges of high computational resource consumption and model complexity. To address these issues, this paper proposes a lightweight semantic segmentation network based on improvements to BlendMask. Firstly, a lightweight Ghost-depthwise separable convolution with efficient channel attention block (GDS-ECA) module is designed. This module replaces a few convolution operations in Ghost convolution with depthwise separable convolution to reduce parameters and computational load, while incorporating an attention mechanism to enhance feature representation capabilities. Secondly, a bottleneck GDS-ECA attention transformer network (BGTNet) is proposed, which applies GDS-ECA convolution to the neck module’s convolution layers to improve feature extraction precision. Additionally, traditional convolutions in the feature pyramid network (FPN) are replaced with GDS-ECA convolutions, creating a lightweight FPN (L-FPN). Combined with BGTNet, this forms the Backbone of the proposed semantic segmentation network. Finally, experiments on the COCO dataset validate the improvements, demonstrating a 7.3 ms reduction in processing time per image, and a 1.5% improvement in average precision.

Cite this article

LIAN Xiaofeng, KANG Maomao, TAN Li, WANG Yanli . A Semantic Segmentation Network Based on Lightweight Convolutional Modules[J]. Journal of Applied Sciences, 2025 , 43(1) : 66 -79 . DOI: 10.3969/j.issn.0255-8297.2025.01.005

References

[1] Newcombe R A, Lovegrove S J, Davison A J. DTAM: dense tracking and mapping in real-time [C]//International Conference on Computer Vision, 2011: 2320-2327.
[2] Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry [C]//IEEE International Conference on Robotics and Automation, 2014: 15-22.
[3] Tateno K, Tombari F, Laina I, et al. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6565-6574.
[4] Campos C, Elvira R, Rodríguez J J G, et al. ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM [J]. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890.
[5] Cadena C, Carlone L, Carrillo H, et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age [J]. IEEE Transactions on Robotics, 2016, 32(6): 1309-1332.
[6] Wolf D F, Sukhatme G S. Mobile robot simultaneous localization and mapping in dynamic environments [J]. Autonomous Robots, 2005, 19(1): 53-65.
[7] Zhao H J, Chiba M, Shibasaki R, et al. SLAM in a dynamic large outdoor environment using a laser scanner [C]//IEEE International Conference on Robotics and Automation, 2008: 1455-1462.
[8] Bescos B, Fácil J M, Civera J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083.
[9] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]//IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017: 640-651.
[10] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[11] He K, Gkioxari G, Dollar P, et al. Mask R-CNN [C]//IEEE International Conference on Computer Vision, 2017: 2980-2988.
[12] Huang Z, Huang L, Gong Y, et al. Mask scoring R-CNN [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 6402-6411.
[13] Chen K, Pang J M, Wang J Q, et al. Hybrid task cascade for instance segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2019: 4969-4978.
[14] Chen X L, Girshick R, He K M, et al. TensorMask: a foundation for dense object segmentation [C]//IEEE/CVF International Conference on Computer Vision, 2019: 2061-2069.
[15] Chollet F. Xception: deep learning with depthwise separable convolutions [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1800-1807.
[16] Howard A G, Zhu M, Chen B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. (2017-04-17) [2024-07-18]. http://arxiv.org/abs/1704.04861.
[17] Sandler M, Howard A, Zhu M L, et al. Mobile NetV2: inverted residuals and linear bottlenecks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 4510- 4520.
[18] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision, 2019: 1314-1324.
[19] Zoph B, Vasudevan V, Shlens J, et al. Learning transferable architectures for scalable image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8697- 8710.
[20] Zhang X Y, Zhou X Y, Lin M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6848-6856.
[21] Ma N N, Zhang X Y, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design [C]//European Conference on Computer Vision, 2018: 122-138.
[22] Han K, Wang Y H, Tian Q, et al. GhostNet: more features from cheap operations [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1577-1586.
[23] Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11531-11539.
[24] Srinivas A, Lin T Y, Parmar N, et al. Bottleneck transformers for visual recognition [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16514-16524.
[25] Li H C, Xiong P F, An J, et al. Pyramid attention network for semantic segmentation [EB/OL]. (2018-05-25) [2024-07-18]. http://arxiv.org/abs/1805.10180.
[26] Chen H, Sun K Y, Tian Z, et al. BlendMask: top-down meets bottom-up for instance segmentation [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8570-8578.
[27] Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]//European Conference on Computer Vision, 2018: 833-851.
Outlines

/