Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Luna Jiménez, Cristina; Gil Martín, Manuel; D'Haro Enríquez, Luis Fernando; Fernández Martínez, Fernando; San Segundo Hernández, Rubén
المصدر:
Expert Systems with Applications, ISSN 0957-4174, 2024-12, Vol. 255
الموضوع:
Telecomunicaciones
نوع التسجيلة:
article in journal/newspaper
اللغة:
English

معلومة اضافية
- بيانات النشر:
  E.T.S.I. Telecomunicación (UPM)
- الموضوع:
  2024
- Collection:
  Universidad Politécnica de Madrid: Archivo Digital de la UPM
- نبذة مختصرة :
  The appearance of Large Language Models (LLM) has implied a qualitative step forward in the performance of conversational agents, and even in the generation of creative texts. However, previous applications of these models in generating dialogues neglected the impact of ‘hallucinations’ in the context of generating synthetic dialogues, thus omitting this central aspect in their evaluations. For this reason, we propose an open-source and flexible framework called GenEvalGPT framework: a comprehensive multi-stage evaluation strategy utilizing diverse metrics. The objective is two-fold: first, the goal is to assess the extent to which synthetic dialogues between a chatbot and a human align with the specified commands, determining the successful creation of these dialogues based on the provided specifications; and second, to evaluate various aspects of emotional and subjective responses. Assuming that dialogues to be evaluated were synthetically produced from specific profiles, the first evaluation stage utilizes LLMs to reconstruct the original templates employed in dialogue creation. The success of this reconstruction is then assessed in a second stage using lexical and semantic objective metrics. On the other hand, crafting a chatbot’s behaviors demands careful consideration to encompass a diverse range of interactions it is meant to engage in. Synthetic dialogues play a pivotal role in this context, as they can be deliberately synthesized to emulate various behaviors. This is precisely the objective of the third stage: evaluating whether the generated dialogues adhere to the required aspects concerning emotional and subjective responses. To validate the capabilities of the proposed framework, we applied it to recognize whether the chatbot exhibited one of two distinct behaviors in the synthetically generated dialogues: being emotional and providing subjective responses, or remaining neutral. This evaluation will encompass traditional metrics and automatic metrics generated by the LLM. In our use case of ...
- File Description:
  application/pdf
- Relation:
  https://www.sciencedirect.com/science/article/pii/S0957417424013915; info:eu-repo/grantAgreement/EC/H2020/101071191; info:eu-repo/grantAgreement/EC/H2020/PID2020-118112RB-C22; info:eu-repo/grantAgreement/MINECO//PID2020-118112RB-C21; info:eu-repo/grantAgreement/MINECO//PID2021-126061OB-C43; info:eu-repo/grantAgreement/MINECO//PDC2021-120846-C42; https://oa.upm.es/82496/
- الدخول الالكتروني :
  https://oa.upm.es/82496/
- Rights:
  https://creativecommons.org/licenses/by/3.0/es/ ; info:eu-repo/semantics/openAccess
- الرقم المعرف:
  edsbas.2ED9E643

تعليقات

No Comments.

Evaluating emotional and subjective responses in synthetic art-related dialogues: A multi-stage framework with large language models

اتصل بنا

اتبع