Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Speech Enhancement Using Deep Neural Networks

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • الموضوع:
      2024
    • Collection:
      KiltHub Research from Carnegie Mellon University
    • نبذة مختصرة :
      Speech enhancement algorithms aim to improve the quality and intelligibility of speech signals degraded by noise to improve human or machine interpretation of speech. Thanks to large-scale datasets and online simulation, supervised algorithms based on deep neural networks can accurately suppress non-stationary noise, making them useful in practice for real-time communication systems and as the front end of automatic speech recognition systems. Despite all the advances, the extent to which these algorithms are robust to adverse acoustic conditions and phonetic categories of speech stimuli is still being investigated. This thesis addresses supervised speech enhancement in three parts. First, we describe the four-region error that serves as a diagnostic tool for speech enhancement algorithms. Compared to popular perceptual measures of speech quality, the four-region error distinguishes between two universal problems: under-suppression and over-suppression. We will show that all algorithms exhibit a trade-off between these error types and describe loss functions that balance the two. Second, we address the under-suppression problem within the frequency-domain speech enhancement framework. In the domain of instantaneous signal-to-noise ratio (ISNR), we unify algorithms trained on different targets. We will show that all methods face inevitable uncertainties as the ISNR decreases. We then introduce uncertainty learning that quantifies these uncertainties and improves noise reduction capability. Third, we address the over-suppression problem by incorporating phonetic information into the supervised framework. Through measurements of phonetically-dependent four-region error, we identify the over-suppression problem in obstruents in American English as the critical challenge of frequency-domain algorithms. We further identify a class of time-domain algorithms that exhibit different trade-offs and use them to train a phonetic segregation network. Finally, we explore phonetically-dependent channel selection rules to ...
    • Relation:
      https://figshare.com/articles/thesis/Speech_Enhancement_Using_Deep_Neural_Networks/25796812
    • الرقم المعرف:
      10.1184/r1/25796812.v1
    • الدخول الالكتروني :
      https://doi.org/10.1184/r1/25796812.v1
      https://figshare.com/articles/thesis/Speech_Enhancement_Using_Deep_Neural_Networks/25796812
    • Rights:
      CC BY 4.0
    • الرقم المعرف:
      edsbas.20AFC3DC