Item request has been placed!

Item request cannot be made.

Processing Request

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Jang, Youngjoon; Kim, Ji-Hoon; Ahn, Junseok; Kwak, Doyeop; Yang, Hong-Sun; Ju, Yoon-Cheol; Kim, Il-Hwan; Kim, Byeong-Yeol; Chung, Joon Son
الموضوع:
Computer Science - Computer Vision and Pattern Recognition; Computer Science - Artificial Intelligence; Computer Science - Sound; Electrical Engineering and Systems Science - Audio and Speech Processing; Electrical Engineering and Systems Science - Image and Video Processing
نوع التسجيلة:
Working Paper
الدخول الالكتروني :
http://arxiv.org/abs/2405.10272

معلومة اضافية
- الموضوع:
  2024
- Collection:
  Computer Science
- نبذة مختصرة :
  The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations in facial motion for the same identity. To tackle these issues, we introduce a motion sampler based on conditional flow matching, which is capable of high-quality motion code generation in an efficient way. Moreover, we introduce a novel conditioning method for the TTS system, which utilises motion-removed features from the TFG model to yield uniform speech outputs. Our extensive experiments demonstrate that our method effectively creates natural-looking talking faces and speech that accurately match the input text. To our knowledge, this is the first effort to build a multimodal synthesis system that can generalise to unseen identities.
  Comment: CVPR 2024
- الرقم المعرف:
  edsarx.2405.10272

تعليقات

No Comments.