Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

On Representation Learning in Speech Processing and Automatic Speech Recognition

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Biemann, Chris
    • بيانات النشر:
      Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky
    • الموضوع:
      2022
    • Collection:
      E-Dissertationen der Universität Hamburg
    • نبذة مختصرة :
      Speech processing is a difficult task for computers, owing to many factors of variance present in any speech signal. In Automatic Speech Recognition (ASR), a person's voice, the environment and how and where speech sounds are recorded can drastically alter the appearance of a speech signal without changing the content of what is being said. Meanwhile, humans deal seemingly effortlessly with these factors of variance in understanding spoken language. A central question in automatic speech processing is how and what representations to use, to facilitate further processing and to apply machine learning methods to automate speech processing tasks. A focus in this thesis is on learning models and representations from speech data itself. Artificial neural networks have recently reemerged as an important ingredient of acoustic and language modelling and have produced promising results and error reductions over previous methods. They are now a widespread tool in learning good and robust representations for speech signals in ASR and are also typically used in language modelling. After an introduction to speech processing in Chapter 1, this thesis provides an overview of common (deep) neural network techniques and models in Chapter 2. An introduction to speech processing and ASR is given in Chapter 3. In Chapter 4, a study on transfer learning is conducted on an isolated paralinguistic speech task, namely eating condition recognition. With the system presented in this chapter, we also participated in a paralinguistic speech challenge. The model was pre-trained on a language identification task and transfer learning was successfully used for the target task with little training data. In Chapter 5 of this thesis, we propose Unspeech context embeddings. Unspeech models are trained on unannotated speech data using contrastive learning, with Siamese convolutional neural networks. The model is built on the idea that speech sounds that are close in time share the same contexts. The model can be trained on vast amounts of ...
    • Relation:
      http://nbn-resolving.de/urn:nbn:de:gbv:18-ediss-104520; https://ediss.sub.uni-hamburg.de/handle/ediss/9915
    • الدخول الالكتروني :
      http://nbn-resolving.de/urn:nbn:de:gbv:18-ediss-104520
      https://ediss.sub.uni-hamburg.de/handle/ediss/9915
    • Rights:
      http://purl.org/coar/access_right/c_abf2 ; info:eu-repo/semantics/openAccess ; https://creativecommons.org/licenses/by/4.0/
    • الرقم المعرف:
      edsbas.739D9E98