[1] Rafii Z, Pardo B. Repeating pattern extraction technique (REPET):a simple method for music/voice separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1):73-84. [2] Huang Posen, Chen S D, Smaragdis P, et al. Singing-voice separation from monaural recordings using robust principal component analysis[C]//IEEE 2012 International Conference on Acoustics, Speech and Signal Processing, 2012:57-60. [3] Grais E, Erdogan H. Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks[C]//12th Annual Conference of the International Speech Communication Association (Interspeech 2011), 2011. [4] Uhlich S, Giron F, Mitsufuji Y. Deep neural network based instrument extraction from music[C]//IEEE 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP2015), 2015:2135-2139. [5] Sprechmann P, Bruna J, Lecun Y. Audio Source Separation with Discriminative Scattering Networks[C]//International Conference on Latent Variable Analysis and Signal Separation. Springer, Cham, 2015. [6] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [7] Chen J, Wang D L. Long short-term memory for speaker generalization in supervised speech separation[J]. The Journal of the Acoustical Society of America, 2017, 141(6):4705-4714. [8] 张天. 单通道音乐信号中的人声伴奏分离方法研究[D]. 重庆:重庆邮电大学, 2020. [9] Stter F R, Uhlich S, Liutkus A, et al. Open-unmix:a reference implementation for music source separation[J]. The Journal of Open Source Software, 2019, 4(41):1667. [10] Simpson A J R, Roma G, Plumbley M D. Deep karaoke:extracting vocals from musical mixtures using a convolutional deep neural network[C]//12th International Conference on Latent Variable Analysis and Signal Separation (LVA), 2015:429-436. [11] Jansson A, Humphrey E J, Montecchio N, et al. Singing voice separation with deep U-Net convolutional networks[C]//Proceedings of the 2017 International Society for Music Information Retrieval Conference (ISMIR2017), 2017:323-332. [12] Stoller D, Ewert S, Dixon S. Wave-U-Net:a multi-scale neural network for end-to-end audio source separation[C]//2018 International Society for Music Information Retrieval Conference (ISMIR2018), 2018:334-340. [13] Défossez A, Usunier N, Bottou L, et al. Demucs:deep extractor for music sources with extra unlabeled data remixed[DB/OL].[2021-09-29]. https://arxiv.org/abs/1909.01174. [14] 汪斌, 陈宁. 基于残差注意力U-Net结构的端到端歌声分离模型[J]. 华东理工大学学报(自然科学版). 2021, 47(5):619-626. Wang B, Chen N. An end-to-end singing voice separation model based on residual attention u-net[J]. Journal of East China University of Science and Technology, 2021, 47(5):619-626. (in Chinese) [15] Perez-Lapillo J, Galkin O, Weyde T. Improving singing voice separation with the waveU-Net using minimum hyperspherical energy[C]//2020 International Conference on Acoustics, Speech, and Signal Processing (ICASSP2020). IEEE, 2020. [16] Ibtehaz N, Rahman M S. MultiResUNet:rethinking the U-Net architecture for multimodal biomedical image segmentation[J]. Neural Networks, 2019, 121:74-87. [17] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2018). IEEE, 2018. [18] Wang Y, Zhou Q, Liu J, et al. LEDNet:a Lightweight encoder-decoder network for realtime semantic segmentation[C]//2019 IEEE International Conference on Image Processing (ICIP2019). IEEE, 2019. [19] Liu S, Huang D, Wang Y. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018:385-400. [20] Dauphin Y N, Fan A, Auli M, et al. Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning, 2017, 70:933-941. [21] 张天骐, 柏浩钧, 叶绍鹏, 等. 基于门控残差卷积编解码网络的单通道语音增强方法[J]. 信号处理, 2021, 37(10):1986-1995. Zhang T Q, Bai H J, Ye S P, et al. Single-channel speech enhancement method based on gated residual convolution encoder-and-decoder network[J]. Journal of Signal Processing, 2021, 37(10):1986-1995. (in Chinese) [22] Liu J Y, Yang Y H. Dilated convolution with dilated GRU for music source separation[DB/OL].[2021-09-29]. https://arxiv.org/abs/1906.01203. [23] Takahashi N, Mitsufuji Y. D3Net:Densely connected multidilated denseNet for music source separation[DB/OL].[2021-09-29]. https://arxiv.org/abs/2010.01733. [24] Fang Y, Li Y, Tu X, et al. Face completion with hybrid dilated convolution[J].Signal Processing Image Communication, 2019, 80:115664. [25] Oktay O, Schlemper J, Folgoc L L, et al. Attention U-Net:learning where to look for the pancreas[DB/OL].[2021-09-29]. https://arxiv.org/abs/1804.03999. |