Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Ibrahim, Karim, M; Perzo, Antony; Leglaive, Simon
المصدر:
International Conference on Acoustics, Speech, and Signal Processing ; https://hal.science/hal-04364976 ; International Conference on Acoustics, Speech, and Signal Processing, 2024, Seoul, South Korea. ⟨10.1109/icassp48485.2024.10445740⟩
الموضوع:
speech emotion recognition; synthetic data; data augmentation; speech generation; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
نوع التسجيلة:
conference object
اللغة:
English

معلومة اضافية
- Contributors:
  Emobot; CentraleSupélec; Institut d'Électronique et des Technologies du numéRique (IETR); Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes); Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Nantes Université - pôle Sciences et technologie; Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)
- بيانات النشر:
  HAL CCSD
- الموضوع:
  2024
- Collection:
  Université de Rennes 1: Publications scientifiques (HAL)
- الموضوع:
  Seoul; South Korea
- نبذة مختصرة :
  International audience ; One of the main challenges in speech emotion recognition is the lack of large labelled datasets. The progress in speech synthesis allows us to generate reliable and realistic expressive speech. In this work, we propose using a state-of-the-art end-to-end speech emotion conversion model to generate new synthetic data for training speech emotion recognition models. We first evaluate the quality of the converted speech on new unseen datasets, which proves to be on par with the training data. Then, we study the effect of using the synthesized speech as data augmentation. We show that this approach improves the overall performance of emotion recognition models on two different datasets, IEMOCAP and RAVDESS, both in the cases of speaker dependent and independent emotion recognition using a fine-tuned wav2vec 2.0.
- Relation:
  hal-04364976; https://hal.science/hal-04364976; https://hal.science/hal-04364976/document; https://hal.science/hal-04364976/file/ICASSP2024-1.pdf
- الرقم المعرف:
  10.1109/icassp48485.2024.10445740
- الدخول الالكتروني :
  https://doi.org/10.1109/icassp48485.2024.10445740
  https://hal.science/hal-04364976
  https://hal.science/hal-04364976/document
  https://hal.science/hal-04364976/file/ICASSP2024-1.pdf
- Rights:
  info:eu-repo/semantics/OpenAccess
- الرقم المعرف:
  edsbas.F8C0B0AA

تعليقات

No Comments.

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

اتصل بنا

اتبع