Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Is automatic phoneme recognition suitable for speech analysis? Temporal and performance evaluation of an Automatic Speech Recognition model in spontaneous French

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Luxembourg Institute of Health (LIH); Laboratoire Bordelais de Recherche en Informatique (LaBRI); Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS); Sommeil, Addiction et Neuropsychiatrie Bordeaux (SANPSY); Université de Bordeaux (UB)-CHU de Bordeaux Pellegrin Bordeaux -Centre National de la Recherche Scientifique (CNRS); Centre de recherche inter-langues sur la signification en contexte (CRISCO); Université de Caen Normandie (UNICAEN); Normandie Université (NU)-Normandie Université (NU); Yiya Chen; Aoju Chen; Amalia Arvaniti
    • بيانات النشر:
      HAL CCSD
      ISCA
    • الموضوع:
      2024
    • Collection:
      Normandie Université: HAL
    • الموضوع:
    • نبذة مختصرة :
      International audience ; The correct automatic identification and segmentation of phonemes is crucial for a more in-depth exploration of prosodic parameters on a syllabic level. As such, automatic phonemic transcription from spontaneous speech recordings has numerous applications, such as teaching or health monitoring. Such transcriptions are usually evaluated either in terms of correct phoneme estimation or temporal segmentation, each task being addressed by a dedicated system. However, no system to our knowledge has ever been evaluated on doing correctly the two tasks at the same time. This article evaluates a state-of-the-art Kaldi-based phonetic transcription system for spontaneous French. We use the Rhapsodie database, composed of spontaneous speech recordings with diverse levels of planning. Our phoneme recognition system obtains good results on phoneme and phoneme category identification (respective error rates of 19.2% and 13.4%), performed poorly on phonemes and category segmentation: an average of 40% of phoneme duration and 34% of phonetic categories duration have not been detected by it. On both metrics, the performances of the system increase with the degree of planning of the spontaneous speech. These results suggest that improvements are necessary for designing truly reliable automatic phonetic transcription systems to be useful for further analysis
    • Relation:
      hal-04679813; https://hal.science/hal-04679813; https://hal.science/hal-04679813/document; https://hal.science/hal-04679813/file/martin24b_speechprosody.pdf
    • الرقم المعرف:
      10.21437/SpeechProsody.2024-226
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.95FB4C91