Deep Generative Models for Fast, Efficient and Personalized Speech Synthesis ; 고속 고효율 개인화 음성 합성을 위한 생성 모델

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: 김성원
الموضوع:
Generative Model; Speech Synthesis; Text-to-Speech; Normalizing Flows; Diffusion Probabilistic Models; Flow Matching Models; 621.3
نوع التسجيلة:
doctoral or postdoctoral thesis
اللغة:
English

معلومة اضافية
- Contributors:
  윤성로; Sungwon Kim; 공과대학 전기·정보공학부
- بيانات النشر:
  서울대학교 대학원
- الموضوع:
  2024
- Collection:
  Seoul National University: S-Space
- نبذة مختصرة :
  학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2024. 2. 윤성로. ; Advancements in deep generative models have made a significant impact on speech synthesis, improving quality and adaptability in voice generation. This thesis explores ways to improve deep generative models for speech synthesis, addressing challenges like real-time speech synthesis, efficient learning from imperfect data, and personalization. For real-time speech synthesis, we directly tackle the limitation of autoregressive models, which generate high-quality speech but suffer from slow sampling speed. We introduce non-autoregressive models that notably increase synthesis speed without sacrificing quality. Notably, FloWaveNet, a flow-based generative model, achieves synthesis speeds far exceeding WaveNet, enhancing the potential of interactive applications like chatbots and virtual assistants. In addition, we present Guided-TTS, a label-efficient speech synthesis model that directly utilizes untranscribed data. This diffusion-based generative model differs from traditional acoustic models by its ability to synthesize high-quality speech even without transcribed data for the target speaker. It highlights the potential for efficient speech synthesis allowing for the use of a wide range of untranscribed speech data. We further advance personalized speech synthesis in this thesis. We enhance Guided-TTS into Guided-TTS 2, designed to generate voices for unseen speakers during training with minimal reference data. This development facilitates highly personalized voice synthesis for practical applications. Additionally, we introduce P-Flow, a model for zero-shot personalized speech synthesis. P-Flow, built on speech prompting and flow-matching generative models, shows fast, data-efficient synthesis capabilities, comparable to the state-of-the-art autoregressive models, VALL-E, while requiring significantly less training data. These proposed models demonstrate the effective application of deep generative models to address specific challenges in speech synthesis. The ...
- File Description:
  xii, 155
- ISBN:
  978-0-00-000000-2
  0-00-000000-0
- Relation:
  000000181855; https://hdl.handle.net/10371/209709; https://dcollection.snu.ac.kr/common/orgView/000000181855; 000000000051▲000000000062▲000000181855▲
- الدخول الالكتروني :
  https://hdl.handle.net/10371/209709
  https://dcollection.snu.ac.kr/common/orgView/000000181855
- الرقم المعرف:
  edsbas.CE861BF0

تعليقات

No Comments.

Deep Generative Models for Fast, Efficient and Personalized Speech Synthesis ; 고속 고효율 개인화 음성 합성을 위한 생성 모델

اتصل بنا

اتبع