Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Sadeghi, Mostafa; Serizel, Romain
المصدر:
International Conference on Acoustics Speech and Signal Processing (ICASSP) ; https://hal.science/hal-04210679 ; International Conference on Acoustics Speech and Signal Processing (ICASSP), IEEE, Apr 2024, Seoul (Korea), South Korea. ⟨10.48550/arXiv.2309.10439⟩
الموضوع:
Unsupervised speech enhancement; deep generative model; variational autoencoder; posterior sampling; Langevin dynamics; [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing; [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
نوع التسجيلة:
conference object
اللغة:
English

معلومة اضافية
- Contributors:
  Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH); Inria Nancy - Grand Est; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD); Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA); Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA); Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS); IEEE; ANR-22-CE23-0026,REAVISE,Amélioration de la parole audiovisuelle basée sur l'apprentissage profond, robuste et efficace(2022)
- بيانات النشر:
  HAL CCSD
- الموضوع:
  2024
- Collection:
  Université de Lorraine: HAL
- الموضوع:
  Seoul (Korea); South Korea
- نبذة مختصرة :
  International audience ; In this paper, we address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE). This approach offers promising generalization performance over the supervised counterpart. Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity. To tackle this issue, we present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE. By directly sampling from the intractable posterior distribution within the EM process, we circumvent the intricacies of variational inference. We conduct a series of experiments, comparing the proposed methods with VEM and a state-of-the-art supervised speech enhancement approach based on diffusion models. The results reveal that our sampling-based algorithms significantly outperform VEM, not only in terms of computational efficiency but also in overall performance. Furthermore, when compared to the supervised baseline, our methods showcase robust generalization performance in mismatched test conditions.
- Relation:
  info:eu-repo/semantics/altIdentifier/arxiv/2309.10439; hal-04210679; https://hal.science/hal-04210679; https://hal.science/hal-04210679v2/document; https://hal.science/hal-04210679v2/file/EfficientVAE_SE_ICASSP24.pdf; ARXIV: 2309.10439
- الرقم المعرف:
  10.48550/arXiv.2309.10439
- الدخول الالكتروني :
  https://doi.org/10.48550/arXiv.2309.10439
  https://hal.science/hal-04210679
  https://hal.science/hal-04210679v2/document
  https://hal.science/hal-04210679v2/file/EfficientVAE_SE_ICASSP24.pdf
- Rights:
  http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
- الرقم المعرف:
  edsbas.4E6E772E

تعليقات

No Comments.

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

اتصل بنا

اتبع