Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH); Inria Nancy - Grand Est; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD); Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA); Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA); Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS); IEEE; ANR-22-CE23-0026,REAVISE,Amélioration de la parole audiovisuelle basée sur l'apprentissage profond, robuste et efficace(2022)
    • بيانات النشر:
      HAL CCSD
    • الموضوع:
      2024
    • Collection:
      Université de Lorraine: HAL
    • الموضوع:
    • نبذة مختصرة :
      International audience ; In this paper, we address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE). This approach offers promising generalization performance over the supervised counterpart. Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity. To tackle this issue, we present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE. By directly sampling from the intractable posterior distribution within the EM process, we circumvent the intricacies of variational inference. We conduct a series of experiments, comparing the proposed methods with VEM and a state-of-the-art supervised speech enhancement approach based on diffusion models. The results reveal that our sampling-based algorithms significantly outperform VEM, not only in terms of computational efficiency but also in overall performance. Furthermore, when compared to the supervised baseline, our methods showcase robust generalization performance in mismatched test conditions.
    • Relation:
      info:eu-repo/semantics/altIdentifier/arxiv/2309.10439; hal-04210679; https://hal.science/hal-04210679; https://hal.science/hal-04210679v2/document; https://hal.science/hal-04210679v2/file/EfficientVAE_SE_ICASSP24.pdf; ARXIV: 2309.10439
    • الرقم المعرف:
      10.48550/arXiv.2309.10439
    • الدخول الالكتروني :
      https://doi.org/10.48550/arXiv.2309.10439
      https://hal.science/hal-04210679
      https://hal.science/hal-04210679v2/document
      https://hal.science/hal-04210679v2/file/EfficientVAE_SE_ICASSP24.pdf
    • Rights:
      http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.4E6E772E