Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Deep Generative Models for Fast, Efficient and Personalized Speech Synthesis ; 고속 고효율 개인화 음성 합성을 위한 생성 모델

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      윤성로; Sungwon Kim; 공과대학 전기·정보공학부
    • بيانات النشر:
      서울대학교 대학원
    • الموضوع:
      2024
    • Collection:
      Seoul National University: S-Space
    • نبذة مختصرة :
      학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2024. 2. 윤성로. ; Advancements in deep generative models have made a significant impact on speech synthesis, improving quality and adaptability in voice generation. This thesis explores ways to improve deep generative models for speech synthesis, addressing challenges like real-time speech synthesis, efficient learning from imperfect data, and personalization. For real-time speech synthesis, we directly tackle the limitation of autoregressive models, which generate high-quality speech but suffer from slow sampling speed. We introduce non-autoregressive models that notably increase synthesis speed without sacrificing quality. Notably, FloWaveNet, a flow-based generative model, achieves synthesis speeds far exceeding WaveNet, enhancing the potential of interactive applications like chatbots and virtual assistants. In addition, we present Guided-TTS, a label-efficient speech synthesis model that directly utilizes untranscribed data. This diffusion-based generative model differs from traditional acoustic models by its ability to synthesize high-quality speech even without transcribed data for the target speaker. It highlights the potential for efficient speech synthesis allowing for the use of a wide range of untranscribed speech data. We further advance personalized speech synthesis in this thesis. We enhance Guided-TTS into Guided-TTS 2, designed to generate voices for unseen speakers during training with minimal reference data. This development facilitates highly personalized voice synthesis for practical applications. Additionally, we introduce P-Flow, a model for zero-shot personalized speech synthesis. P-Flow, built on speech prompting and flow-matching generative models, shows fast, data-efficient synthesis capabilities, comparable to the state-of-the-art autoregressive models, VALL-E, while requiring significantly less training data. These proposed models demonstrate the effective application of deep generative models to address specific challenges in speech synthesis. The ...
    • File Description:
      xii, 155
    • ISBN:
      978-0-00-000000-2
      0-00-000000-0
    • Relation:
      000000181855; https://hdl.handle.net/10371/209709; https://dcollection.snu.ac.kr/common/orgView/000000181855; 000000000051▲000000000062▲000000181855▲
    • الدخول الالكتروني :
      https://hdl.handle.net/10371/209709
      https://dcollection.snu.ac.kr/common/orgView/000000181855
    • الرقم المعرف:
      edsbas.CE861BF0