应用科学学报

• 通信工程 • 上一篇    下一篇

高合成质量的语音转换系统

徐宁,杨震   

  1. 南京邮电大学 信号处理与传输研究院,江苏 南京210003
  • 收稿日期:2007-11-05 修回日期:2008-04-23 出版日期:2008-07-31 发布日期:2008-07-31

High Quality Voice Morphing System

Xu Ning, Yang Zhen
  

  1. Institute of Signal Processing and Transmission, Nanjing University of Posts and Telecommunications, Nanjing 210003,China
  • Received:2007-11-05 Revised:2008-04-23 Online:2008-07-31 Published:2008-07-31

摘要:

提出并实现了一种基于STRAIGHT模型的、合成语音质量较高的语音转换系统。一方面STRAIGHT模型可以对语音的基频、时长等参数进行较大幅度的修改而不引起合成语音质量的下降,另一方面引入了“预测”谱参数的概念,利用搜索预测码书的方法进行转换,既避免了经典的系统中GMM模型谱参数估计的过平滑问题,又克服了经典系统基于LPC模型合成语音时,在帧与帧连接处会产生较大的脉冲波形的缺点。语谱图分析、ABX测试和MOS分评价结果表明:提出的语音转换算法在合成语音质量和目标说话人特征映射上都远远好于经典的基于LPC模型的GMM转换系统。

关键词: STRAIGHT模型, 基频预测, 谱参数预测, 语音转换

Abstract: This paper introduces a novel predictable voice morphing system. It is superior due first to the use of the STRAIGHT model that allows flexible manipulation of speech parameters such as pitch, vocal tract length, and speaking rate while maintaining high reproduction quality. The advantage of the system is also attributed to the introduction of the predictable spectrogram, resolving the problems of over smoothing of GMM mapping, and discontinuities between consecutive frames caused by traditional LPC model. Subjective evaluation and objective measurement indicate that the proposed method outperforms the traditional method both in synthesized quality and precision of mapping target characteristics.

Key words: STRAIGHT model, predictable pitch, predictable spectrogram, voice morphing