遥感影像分割是影像解译与分析的必要过程,随着深度学习在特征表达上的优势逐步显现,以深度网络为基础模型的影像语义分割已成为自动分割的主要研究趋势.该文提出了一种基于深度残差网络的多尺度语义分割模型,旨在针对小样本遥感影像数据集,提高具有不同尺度分割对象的遥感影像分割精度.首先将深度残差网络以全卷积网络形式进行微调,实现端到端语义分割模型结构构建;然后针对全卷积网络粗糙分割输出的问题,引入Atrous卷积精细化模型上采样过程,进而提高输出标签图精度;最后针对小样本数据进行随机多尺度数据增强,通过样本扩充提高模型分类精度和鲁棒性.试验基于ISPRS 2D Vaihingen语义分割数据集,影像分割结果的分类精度达到89.7%,尤其在小尺度对象上具有较好分割效果.
As an important part of image interpretation and analysis, segmentation of remote sensing images has been widely researched. However, traditional segmentation method based on hand-crafted features has its limitations on accuracy and generalization, state-of-the-art methods are mainly relied on deep learning in recent years. In this paper, we propose a new segmentation method based on multi-scale deep residual neural networks, which aims at improving segmentation accuracy, especially on small-scale objects. We frstly utilize Residual Network (ResNet) and transform it to fully convolution networks (FCN), in which, Atrous convolution is introduced during the up-sampling process to ensure the feld of view on each layer. Then we add multi-scale data augmentation to improve the robustness for small objects. The proposed approach is applied on ISPRS 2D Vaihingen semantic labeling contest dataset, and yields high accuracy at 89.7%, outperforming most state-of-the-art methods.
[1] 高新波,张军平. 机器学习及其应用[M]. 北京:清华大学出版社,2015.
[2] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Computer Vision and Pattern Recognition. IEEE, 2015:3431-3440.
[3] Audebert N, Saux B L, Lefèvre S. Semantic segmentation of earth observation data using multimodal and multi-scale deep networks[C]//Asian Conference on Computer Vision. Springer, Cham, 2016:180-196.
[4] Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab:semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, 40(4):834-848.
[5] Lin G S, Milan A, Shen C H, Reid L. RefneNet:multi-path refnement networks for highresolution semantic segmentation[C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
[6] Zhao H S, Shi J P, Qi X J, Wang X G, Jia J. Pyramid scene parsing network[C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:6230-6239.
[7] Ronneberger O, Fischer P, Brox T. U-Net:convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, Cham, 2015:234-241.
[8] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[C]//International Conference on Learning Representations (ICLR), 2016.
[9] Zhao J, Zhong Y, Shu H, Zhang L. High-resolution image classifcation integrating spectralspatial-location cues by conditional random felds[J]. IEEE Transactions on Image Processing, 2016, 25(9):4033-4045.
[10] Zhou X, Takayama R, Wang S, Hara T, Fujita H. Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method[J]. Medical Physics, 2017, 44(10):5221-5233.
[11] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Learning Representations (ICLR), 2015.
[12] Paisitkriangkrai S, Sherrah J, Janney P. Effective semantic pixel labelling with convolutional networks and conditional random felds[C]//Computer Vision and Pattern Recognition, 2015:36-43.
[13] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Dumitru E, Vanhoucke V, Rabinovich A. Going deeper with convolutions[C]//Conference on Computer Vision and Pattern Recognition. IEEE, 2015:1-9.
[14] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition[C]//Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[15] Powers D. Evaluation:from precision, recall and F-measure to ROC, informedness, markedness & correlation[J]. Journal of Machine Learning Technologies, 2011, 2(1):37-63.