Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Ritika Lakshminarayanan; Ayesha Shaik; Ananthakrishnan Balasundaram
المصدر:
Results in Engineering, Vol 25, Iss , Pp 103943- (2025)
الموضوع:
Automatic speech recognition; Reinforcement learning; Proximal policy optimization; Large language model; Phonetic transcription; Speech synthesis markup language; Technology
نوع التسجيلة:
article in journal/newspaper
اللغة:
English

معلومة اضافية
- بيانات النشر:
  Elsevier
- الموضوع:
  2025
- Collection:
  Directory of Open Access Journals: DOAJ Articles
- نبذة مختصرة :
  Traditional approaches to pronunciation correction often face challenges in personalization, adaptability, and consistent feedback. This study introduces a novel AI-powered system that integrates Reinforcement Learning (RL) and Large Language Models (LLMs) to address these limitations. The system employs a custom Proximal Policy Optimization (PPO) algorithm for precise pronunciation evaluation and an Large Language Models to deliver detailed, encouraging, and user-specific feedback. It was evaluated using the CMU Sphinx Dictionary dataset, a foundational phonetic resource, alongside dynamically generated user-specific session data for personalized feedback and model refinement. Further validation utilized datasets such as TIMIT, LibriTTS, SpeechOcean762, and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), enabling direct comparisons with contemporary methods. Results demonstrate the system's robustness in handling diverse phonetic variations. While primarily tested on English data, its modular architecture supports adaptation to other languages and dialects through language-specific phonetic datasets. The system achieved exceptional performance metrics: 97.9 % phoneme-level accuracy, 87.7 % word-level accuracy, 95.2 % syllable count accuracy, and 89.4 % perfect accuracy on the CMU Sphinx dataset. This innovative approach underscores the potential of advanced AI techniques to enhance the personalization and effectiveness of pronunciation correction systems. All findings are quantitatively validated and thoroughly documented.
- Relation:
  http://www.sciencedirect.com/science/article/pii/S2590123025000313; https://doaj.org/toc/2590-1230; https://doaj.org/article/4d3c5a63f38149059168a1e6391097b4
- الرقم المعرف:
  10.1016/j.rineng.2025.103943
- الدخول الالكتروني :
  https://doi.org/10.1016/j.rineng.2025.103943
  https://doaj.org/article/4d3c5a63f38149059168a1e6391097b4
- الرقم المعرف:
  edsbas.75B46800

تعليقات

No Comments.

Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models

اتصل بنا

اتبع