Journal of Applied Sciences ›› 2023, Vol. 41 ›› Issue (5): 815-830.doi: 10.3969/j.issn.0255-8297.2023.05.008

• Signal and Information Processing • Previous Articles    

Singing Voice Separation Method of Unet Based on Squeeze-and-Excitation Residual Group Dilated Convolution and Dense Linear Gate

ZHANG Tianqi, XIONG Tian, WU Chao, WEN Bin   

  1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2021-09-29 Published:2023-09-28

Abstract: To improve speech timing information capture and utilize underlying features in Unet frequency domain singing voice separation network model, a convolutional neural network with smaller parameters and better song separation effect is proposed in this paper. Firstly, a residual group dilated convolution combined with squeeze-and-excitation module is incorporated into the encoding and decoding stage. While reducing the number of parameters and increasing the receptive field of the network, it can adaptively learn the importance of different channel features, so as to enhance the useful features and suppress the irrelevant ones. Secondly, in the transmission layer, the gating linear units are connected by dense addition to enhance the acquisition of temporal features in the process of feature transmission, and the dilated convolution is used to replace the ordinary convolution to expand the receptive field of the network. Finally, the attention gating mechanism is used to replace the jump connection in the baseline Unet to enhance the utilization of the underlying features. Experiments were conducted on the Ccmixter and MUSDB18 datasets, compared with the baseline network, the proposed approach achieves improvement in voice separation performance with only about one-fifth of the parameters.

Key words: singing voice separation, group dilated convolution, gating linear units, attention gating

CLC Number: