Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Preprint
    • بيانات النشر:
      MDPI AG, 2025.
    • الموضوع:
      2025
    • نبذة مختصرة :
      Large language models (LLMs) are increasingly adopted in medical question answering (QA) scenarios. However, LLMs have been proven to generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. Conformal Prediction (CP) is now recognized as a robust framework within the broader domain of machine learning, offering statistically rigorous guarantees of marginal (average) coverage for prediction sets. However, the applicability of CP in medical QA remains to be explored. To address this limitation, this study proposes an enhanced CP framework for medical multiple-choice question answering (MCQA) tasks. The enhanced CP framework associates the non-conformance score with the frequency score of the correct option. The framework generates multiple outputs for the same medical query by leveraging self-consistency theory. The proposed framework calculates the frequency score of each option to address the issue of limited access to the model’s internal information. Furthermore, a risk control framework is incorporated into the enhanced CP framework to manage task-specific metrics through a monotonically decreasing loss function. The enhanced CP framework is evaluated on three popular MCQA datasets using off-the-shelf LLMs. Empirical results demonstrate that the enhanced CP framework achieves user-specified average (or marginal) error rates on the test set. Moreover, the results show that the test set’s average prediction set size (APSS) decreases as the risk level increases. It is concluded that it is a promising evaluation metric for the uncertainty of LLMs.
    • ISSN:
      2227-7390
    • الرقم المعرف:
      10.3390/math13091538
    • الرقم المعرف:
      10.48550/arxiv.2503.05505
    • Rights:
      CC BY
      arXiv Non-Exclusive Distribution
    • الرقم المعرف:
      edsair.doi.dedup.....482d6da0a74012639a2817df97d5afd9