[1] Ian B, Michael B, Brett W. Enhancements to DTW and VQ decision algorithms for speaker recognition [J]. Speech Communication, 1993, 13(3/4): 427-433 [2] Gersho A, Gray R M. Vector quantization and signal compression [M]. Berlin: Springer, 1992. [3] Reynolds D A. Speaker identification and verification using Gaussian mixture speaker models [J]. Speech Communication, 1995, 17(1/2): 91-108. [4] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10(1/2/3): 19-41. [5] Huang G, Liu Z, Van L, et al. Densely connected convolutional networks [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 2261-2269. [6] Malykh E, Novoselov S, Kudashev O. On residual CNN in text-dependent speaker verification task [C]//19th International Conference on Speech and Computer (SPECOM), 2017, 10458: 593-601. [7] Snyder D, Garcia-Romero D, Povey D, et al. Deep neural network embeddings for textindependent speaker verification [C]//Proceedings of the Interspeech 2017, 2017: 999-1003. [8] 王华朋. 基于深度双向LSTM网络的说话人识别[J]. 计算机工程与设计, 2020, 41(6): 1768-1772. Wang H P. Speaker recognition based on deep bidirectional LSTM network [J]. Computer Engineering and Design, 2020, 41(6): 1768-1772. (in Chinese) [9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 770-778. [10] Gao S H, Cheng M M, Zhao K, et al. Res2net: a new multi-scale backbone architecture [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662. [11] Szegedy C. Going deeper with convolutions [C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015: 1-9. [12] Zhang Y, Lyu Z, Wu H, et al. MFA-conformer: multi-scale feature aggregation conformer for automatic speaker verification [C]//Proceedings of the Interspeech 2022, 2022: 306-310. [13] Koji O, Takafumi K, Koichi S. Attentive statistics pooling for deep speaker embedding [C]//Proceedings of the Interspeech 2018, 2018: 2252-2256. [14] Han B, Chen Z, Qian Y. Local information modeling with self-attention for speaker verification [C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing, 2022: 6727-6731. [15] Woo S, Park J, Lee J, et al. CBAM: convolutional block attention module [C]//15th European Conference on Computer Vision (ECCV), 2018, 11211: 3-19. [16] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. [17] Zhao M, Ma Y F, Liu M, et al. The speakIn system for voxCeleb speaker recognition challange 2021[DB/OL]. 2021[2023-02-27]. https://arxiv.org/abs/2109.01989. [18] Desplanques B, Thienpondt J, Demuynck K. ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification [C]//Proceedings of the Interspeech 2020, 2020: 3830-3834. [19] Safari P, India M, Hernando J. Self-attention encoding and pooling for speaker recognition [C]//Proceedings of the Interspeech 2020, 2020: 941-945. [20] Wu Z Z, Kinnunen T, Evans N, et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge [C]//16th Annual Conference of the International Speech Communication Association, 2015: 2037-2041. |