Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

How to Leverage a Multi-layered Transformer Language Model for Text Clustering: an Ensemble Approach

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      CB - Centre Borelli - UMR 9010 (CB); Service de Santé des Armées-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Ecole Normale Supérieure Paris-Saclay (ENS Paris Saclay)-Université Paris Cité (UPCité); Caisse des dépôts et consignations (France) (CDC)
    • بيانات النشر:
      HAL CCSD
      ACM
    • الموضوع:
      2021
    • Collection:
      Archive ouverte du Service de Santé des Armées (HAL)
    • الموضوع:
    • نبذة مختصرة :
      International audience ; Pre-trained Transformer-based word embeddings are now widely used in text mining where they are known to significantly improve supervised tasks such as text classification and named entity recognition and question answering. Since the Transformer models create several different embeddings for the same input, one at each layer of their architecture, various studies have already tried to identify those of these embeddings that most contribute to the success of the above-mentioned tasks. In contrast the same performance analysis has not yet been carried out in the unsupervised setting. In this paper we evaluate the effectiveness of Transformer models on the important task of text clustering. In particular, we present a clustering ensemble approach that harnesses all the network's layers. Numerical experiments carried out on real datasets with different Transformer models show the effectiveness of the proposed method compared to several baselines.
    • Relation:
      hal-03963423; https://hal.science/hal-03963423; https://hal.science/hal-03963423/document; https://hal.science/hal-03963423/file/aitsaada_etal_cikm2021.pdf
    • الرقم المعرف:
      10.1145/3459637.3482121
    • الدخول الالكتروني :
      https://hal.science/hal-03963423
      https://hal.science/hal-03963423/document
      https://hal.science/hal-03963423/file/aitsaada_etal_cikm2021.pdf
      https://doi.org/10.1145/3459637.3482121
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.3538772F