Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

A metric learning-based method for biomedical entity linking

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Le, Ngoc D.; Nguyen, Nhung T. H.
  • المصدر:
    Frontiers in Research Metrics and Analytics ; volume 8 ; ISSN 2504-0537
  • نوع التسجيلة:
    article in journal/newspaper
  • اللغة:
    unknown
  • معلومة اضافية
    • بيانات النشر:
      Frontiers Media SA
    • الموضوع:
      2023
    • Collection:
      Frontiers (Publisher - via CrossRef)
    • نبذة مختصرة :
      Biomedical entity linking task is the task of mapping mention(s) that occur in a particular textual context to a unique concept or entity in a knowledge base, e.g., the Unified Medical Language System (UMLS). One of the most challenging aspects of the entity linking task is the ambiguity of mentions, i.e., (1) mentions whose surface forms are very similar, but which map to different entities in different contexts, and (2) entities that can be expressed using diverse types of mentions. Recent studies have used BERT-based encoders to encode mentions and entities into distinguishable representations such that their similarity can be measured using distance metrics. However, most real-world biomedical datasets suffer from severe imbalance, i.e., some classes have many instances while others appear only once or are completely absent from the training data. A common way to address this issue is to down-sample the dataset, i.e., to reduce the number instances of the majority classes to make the dataset more balanced. In the context of entity linking, down-sampling reduces the ability of the model to comprehensively learn the representations of mentions in different contexts, which is very important. To tackle this issue, we propose a metric-based learning method that treats a given entity and its mentions as a whole, regardless of the number of mentions in the training set. Specifically, our method uses a triplet loss-based function in conjunction with a clustering technique to learn the representation of mentions and entities. Through evaluations on two challenging biomedical datasets, i.e., MedMentions and BC5CDR, we show that our proposed method is able to address the issue of imbalanced data and to perform competitively with other state-of-the-art models. Moreover, our method significantly reduces computational cost in both training and inference steps. Our source code is publicly available here .
    • الرقم المعرف:
      10.3389/frma.2023.1247094
    • الرقم المعرف:
      10.3389/frma.2023.1247094/full
    • Rights:
      https://creativecommons.org/licenses/by/4.0/
    • الرقم المعرف:
      edsbas.E7409B3F