Comparing NER approaches on French clinical text, with easy-to-reuse pipelines

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Hubert, Thibault; Vaillant, Ghislain; Birot, Olivier; Arias, Camila; Neuraz, Antoine; Coulet, Adrien
المصدر:
MIE 2024 - 34th Medical Informatics Europe Conference ; https://inria.hal.science/hal-04584688 ; MIE 2024 - 34th Medical Informatics Europe Conference, Aug 2024, Athens, Greece
الموضوع:
Clinical texts; Named Entity Recognition; Benchmark; Open science; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]; [SDV.MHEP]Life Sciences [q-bio]/Human health and pathology; [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie
نوع التسجيلة:
conference object
اللغة:
English

معلومة اضافية
- Contributors:
  Health data- and model- driven Knowledge Acquisition (HeKA); Inria de Paris; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138)); École Pratique des Hautes Études (EPHE); Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité)-École Pratique des Hautes Études (EPHE); Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité); Université Paris Cité (UPCité); Service d'informatique médicale et biostatistiques CHU Necker; Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpital Necker - Enfants Malades AP-HP; Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP); Inria; ANR-22-PESN-0007,ShareFAIR,Sharing reliable protocols to transform datasets into gold standards: Application to Neuro-Vascular Pathologies(2022)
- بيانات النشر:
  HAL CCSD
- الموضوع:
  2024
- الموضوع:
  Athens; Greece
- نبذة مختصرة :
  International audience ; The task of Named Entity Recognition (NER) is central for leveraging the content of clinical texts in observational studies. Indeed, texts contain a large part of the information available in Electronic Health Records (EHRs). However, clinical texts are highly heterogeneous between healthcare services and institutions, between countries and languages, making it hard to predict how existing tools may perform on a particular corpus. We compared four NER approaches on three French corpora and share our benchmarking pipeline in an open and easy-to-reuse manner, using the medkit Python library. We include in our pipelines fine-tuning operations with either one or several of the considered corpora. Our results illustrate the expected superiority of language models over a dictionary-based approach, and question the necessity of refining models already trained on biomedical texts. Beyond benchmarking, we believe sharing reusable and customizable pipelines for comparing fast-evolving Natural Language Processing (NLP) tools is a valuable contribution, since clinical texts themselves can hardly be shared for privacy concerns.
- Relation:
  hal-04584688; https://inria.hal.science/hal-04584688; https://inria.hal.science/hal-04584688/document; https://inria.hal.science/hal-04584688/file/hubert_et_al_camera_ready.pdf
- Rights:
  http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
- الرقم المعرف:
  edsbas.C04FAE0C

تعليقات

No Comments.

Comparing NER approaches on French clinical text, with easy-to-reuse pipelines

اتصل بنا

اتبع