[1] 贺前华, 詹俊瑶, 严海康, 等. 一种基于改进动态时间规整算法的语音样本筛选方法: 中国, CN111179914B [P]. 2022-12-16. [2] 李玉华. 基于隐马尔可夫模型的连续语音同步识别系统[J]. 现代电子技术, 2019, 42(11): 64-67, 71. Li Y H. Continuous speech synchronization recognition system based on hidden Markov model [J]. Modern Electronics Technique, 2019, 42(11): 64-67, 71. (in Chinese) [3] Barai B, Chakraborty T, Das N, et al. Closed-set speaker identification using VQ and GMM based models [J]. International Journal of Speech Technology, 2022, 25(1): 173-196. [1] 贺前华, 詹俊瑶, 严海康, 等. 一种基于改进动态时间规整算法的语音样本筛选方法: 中国, CN111179914B [P]. 2022-12-16. [2] 李玉华. 基于隐马尔可夫模型的连续语音同步识别系统[J]. 现代电子技术, 2019, 42(11): 64-67, 71. Li Y H. Continuous speech synchronization recognition system based on hidden Markov model [J]. Modern Electronics Technique, 2019, 42(11): 64-67, 71. (in Chinese) [3] Barai B, Chakraborty T, Das N, et al. Closed-set speaker identification using VQ and GMM based models [J]. International Journal of Speech Technology, 2022, 25(1): 173-196. [4] 高骥. 基于语种对抗训练的跨语种说话人识别研究[D]. 武汉: 华中科技大学, 2018. [5] Wan L, Wang Q, Papir A, et al. Generalized end-to-end loss for speaker verification [C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 4879- 4883. [6] Hajibabaei M, Dai D. Unified hypersphere embedding for speaker recognition [DB/OL]. 2018[2023-11-08]. http://arxiv.org/abs/1807.08312. [7] Nakamura E, Kageyama Y, Hirose S. LSTM-based Japanese speaker identification using an omnidirectional camera and voice information [J]. IEEE Transactions on Electrical and Electronic Engineering, 2022, 17(5): 674-684. [8] Wei G C, Zhang Y N, Min H, et al. End-to-end speaker identification research based on multi-scale SincNet and CGAN [J]. Neural Computing and Applications, 2023, 35(30): 22209- 22222. [9] Kreuk F, Adi Y, Cisse M, et al. Fooling end-to-end speaker verification with adversarial examples [C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 1962-1966. [10] Huang C Y, Lin Y Y, Lee H Y, et al. Defending your voice: adversarial attack on voice conversion [C]//2021 IEEE Spoken Language Technology Workshop (SLT), 2021: 552-559. [11] Chen G K, Chen S, Fan L L, et al. Who is real bob? adversarial attacks on speaker recognition systems [C]//IEEE Symposium on Security and Privacy, 2021: 694-711. [12] Liu S X, Wu H B, Lee H Y, et al. Adversarial attacks on spoofing countermeasures of automatic speaker verification [C]//IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019: 312-319. [13] Carlini N, Wagner D. Audio adversarial examples: targeted attacks on speech-to-text [C]//IEEE Security and Privacy Workshops (SPW), 2018: 1-7. [14] Tian X H, Das R K, Li H Z. Black-box attacks on automatic speaker verification using feedback-controlled voice conversion [C]//Speaker and Language Recognition Workshop, 2020: 159-164. [15] Park S W, Kim D Y, Joe M C. Cotatron: transcription-guided speech encoder for any-tomany voice conversion without parallel data [C]//Interspeech 2020, 2020, 1542: 4696-4700. [16] Bodin E, Malik I, Ek C H, et al. Nonparametric inference for auto-encoding variational Bayes [DB/OL]. 2017[2023-11-08]. http://arxiv.org/abs/1712.06536. [17] Kameoka H, Kaneko T, Tanaka K, et al. StarGAN-VC: non-parallel many-to-many voice conversion using star generative adversarial networks [C]//IEEE Spoken Language Technology Workshop (SLT), 2018: 266-273. [18] Kameoka H, Kaneko T, Tanaka K, et al. StarGAN-VC2: rethinking conditional methods for StarGAN-based voice conversion [C]//Interspeech 2019, 2019, 2236: 679-683. [19] Dhar S, Jana N D, Das S. An adaptive-learning-based generative adversarial network for one-to-one voice conversion [J]. IEEE Transactions on Artificial Intelligence, 2023, 4(1): 92-106. [20] Zhao Z Q, Ma S F, Jia Y, et al. Disentangling content information by combining ASR and TTS bottleneck features for voice conversion [J]. International Journal of Asian Language Processing, 2023, 33(1), 235-246. [21] Kaneko T, Kameoka H, Hiramatsu K, et al. Sequence-to-sequence voice conversion with similarity metric learned using generative adversarial networks [C]//Interspeech 2017, 2017, 970: 1283-1287. [22] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]//2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2242-2251. |