[1] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 1492-1500. [3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7132-7141. [4] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 2117-2125. [5] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8759-8768. [6] Tan M, Pang R, Le Q V. EfficientDet: scalable and efficient object detection [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 10781-10790. [7] Picron C, Tuytelaars T. Trident pyramid networks: the importance of processing at the feature pyramid level for better object detection [J/OL] (2021-10-08) [2022-5-30]. https://arXiv:2110.04004. [8] Gao S H, Cheng M M, Zhao K, et al. Res2Net: a new multi-scale backbone architecture [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(2): 652-662. [9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778. [10] Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection [C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 6154-6162. [11] Chng C K, Liu Y, Sun Y, et al. ICDAR2019 robust reading challenge on arbitrary-shaped textRRC-art [C]//Proceedings of the 15th IEEE International Conference on Document Analysis and Recognition, Sydney, Australia, 2019: 1571-1576. [12] Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition [C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, 2017, 1: 935-942. [13] Liu Y, Jin L, Zhang S, et al. Curved scene text detection via transverse and longitudinal sequence connection [J]. Pattern Recognition, 2019, 90: 337-345. [14] Chen K, Wang J, Pang J, et al. MMDetection: open mmlab detection toolbox and benchmark [J/OL]. (2019-06-17) [2022-05-30]. http://arXiv:1906.07155. [15] Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks [C]//2022 IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 2735-2745. [16] Robbins H, Monro S. A stochastic approximation method [J]. The Annals of Mathematical Statistics, 1951: 400-407. [17] Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database [C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 248-255. |