نبذة مختصرة : 학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2024. 2. 윤성로. ; Advancements in deep generative models have made a significant impact on speech synthesis, improving quality and adaptability in voice generation. This thesis explores ways to improve deep generative models for speech synthesis, addressing challenges like real-time speech synthesis, efficient learning from imperfect data, and personalization. For real-time speech synthesis, we directly tackle the limitation of autoregressive models, which generate high-quality speech but suffer from slow sampling speed. We introduce non-autoregressive models that notably increase synthesis speed without sacrificing quality. Notably, FloWaveNet, a flow-based generative model, achieves synthesis speeds far exceeding WaveNet, enhancing the potential of interactive applications like chatbots and virtual assistants. In addition, we present Guided-TTS, a label-efficient speech synthesis model that directly utilizes untranscribed data. This diffusion-based generative model differs from traditional acoustic models by its ability to synthesize high-quality speech even without transcribed data for the target speaker. It highlights the potential for efficient speech synthesis allowing for the use of a wide range of untranscribed speech data. We further advance personalized speech synthesis in this thesis. We enhance Guided-TTS into Guided-TTS 2, designed to generate voices for unseen speakers during training with minimal reference data. This development facilitates highly personalized voice synthesis for practical applications. Additionally, we introduce P-Flow, a model for zero-shot personalized speech synthesis. P-Flow, built on speech prompting and flow-matching generative models, shows fast, data-efficient synthesis capabilities, comparable to the state-of-the-art autoregressive models, VALL-E, while requiring significantly less training data. These proposed models demonstrate the effective application of deep generative models to address specific challenges in speech synthesis. The ...
No Comments.