Journal of Applied Sciences ›› 2024, Vol. 42 ›› Issue (4): 709-722.doi: 10.3969/j.issn.0255-8297.2024.04.012

• Computer Science and Applications • Previous Articles    

Intelligent Synthetic Voice Speaker Verification Method Based on Group-Res2Block

LI Fei1, SU Zhaopin1,2, WANG Niansong3, YANG Bo3, ZHANG Guofu1,2   

  1. 1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, Anhui, China;
    2. Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei University of Technology, Hefei 230601, Anhui, China;
    3. Institute of Forensic Science, Department of Public Security of Anhui Province, Hefei 230000, Anhui, China
  • Received:2023-02-27 Published:2024-08-01

Abstract: The existing speaker verification task is primarily based on natural speech conditions, rendering it unsuitable for intelligent speech synthesis. In response, this paper proposes an intelligent synthetic voice speaker verification method based on Group-Res2Block. Firstly, the Group-Res2Block structure is designed, integrating the current group with adjacent front and rear groups to foster a stronger contextual connection of the speaker’s local characteristics. Secondly, a multi-scale channel attention feature fusion mechanism with parallel structure is designed. This mechanism employs various-sized convolution kernels to select features of the same level in the channel dimension, thereby extracting more expressive speaker features and avoiding information redundancy. Finally, a multi-scale attention feature fusion mechanism of serial structure is designed, and a layer structure is constructed to integrate the deep and shallow features as a whole and give different weights to obtain the optimal feature expression. To verify the effectiveness of the proposed feature extraction network, this paper constructs two kinds of intelligent synthetic speech datasets in Chinese and English. Through ablation and comparative experiments, it is shown that the proposed method outperforms others on evaluation metrics such as accuracy (ACC), equal error rate (EER) and minimum detection cost function (minDCF) for the task. Furthermore, the test results of the generalization performance of the model verify its applicability to unknown intelligent speech algorithms.

Key words: speaker verification, intelligent voice synthesis, Group-Res2Block deep neural network, multi-scale features, attention mechanism

CLC Number: