Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Machine translation as an underrated ingredient? : solving classification tasks with large language models for comparative research

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • الموضوع:
      2023
    • Collection:
      Jagiellonian University Repository
    • نبذة مختصرة :
      While large language models have revolutionised computational text analysis methods, the field is still tilted towards English language resources. Even as there are pre-trained models for some "smaller" languages, the coverage is far from universal, and pre-training large language models is an expensive and complicated task. This uneven language coverage limits comparative social research in terms of its geographical and linguistic scope. We propose a solution that sidesteps these issues by leveraging transfer learning and open-source machine translation. We use English as a bridge language between Hungarian and Polish bills and laws to solve a classification task related to the Comparative Agendas Project (CAP) coding scheme. Using the Hungarian corpus as training data for model fine-tuning, we categorise the Polish laws into 20 CAP categories. In doing so, we compare the performance of Transformer-based deep learning models (monolinguals, such as BERT, and multilinguals such as XLM-RoBERTa) and machine learning algorithms (e.g., SVM). Results show that the fine-tuned large language models outperform the traditional supervised learning benchmarks but are themselves surpassed by the machine translation approach. Overall, the proposed solution demonstrates a viable option for applying a transfer learning framework for low-resource languages and achieving state-of-the-art results without requiring expensive pre-training.
    • Relation:
      https://ruj.uj.edu.pl/xmlui/handle/item/320960
    • الرقم المعرف:
      10.5117/CCR2023.2.6.MATE
    • الدخول الالكتروني :
      https://ruj.uj.edu.pl/xmlui/handle/item/320960
      https://doi.org/10.5117/CCR2023.2.6.MATE
      https://www.aup-online.com/content/journals/10.5117/CCR2023.2.6.MATE
    • Rights:
      Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa ; http://creativecommons.org/licenses/by/4.0/legalcode.pl
    • الرقم المعرف:
      edsbas.A97BBB68