Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

ARBRES Kenstur: a Breton-French Parallel Corpus Rooted in Field Linguistics

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice); Université Sorbonne Nouvelle - Paris 3-Université Sorbonne Paris Cité (USPC)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Sciences et Lettres (PSL)-Département Littératures et langage - ENS-PSL (LILA); École normale supérieure - Paris (ENS-PSL); Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-École normale supérieure - Paris (ENS-PSL); Université Paris Sciences et Lettres (PSL); Modèles, Dynamiques, Corpus (MoDyCo); Université Paris Nanterre (UPN)-Centre National de la Recherche Scientifique (CNRS); Centre de recherche sur la langue et les textes basques (IKER); Université de Pau et des Pays de l'Adour (UPPA)-Université Bordeaux Montaigne (UBM)-Centre National de la Recherche Scientifique (CNRS); ELRA Language Resources Association Language Resources Association; International Committee on Computational Linguistics; ANR-21-CE38-0017,Autogramm,Induction de grammaires descriptives à partir de corpus(2021)
    • بيانات النشر:
      CCSD
    • الموضوع:
      2024
    • Collection:
      Université Sorbonne Nouvelle - Paris 3: HAL
    • الموضوع:
    • نبذة مختصرة :
      International audience ; ARBRES is an ongoing project of open science implemented as a platform (“wikigrammar”) documenting both the Breton language itself and the state of research and engineering work in linguistics and NLP. Along its nearly 15 years of operation, it has aggregated a wealth of linguistic data in the form of interlinear glosses with translations illustrating lexical items, grammatical features, dialectal variations… While these glosses were primarily meant for human consumption, their volume and the regular format imposed by the wiki engine used for the website also make them suitable for machine processing. ARBRES Kenstur is a new parallel corpus derived from the glosses in ARBRES, including about 5k phrases and sentences in Breton along with translations in standard French. The nature of the original data — sourced from field linguistic inquiries meant to document the structure of Breton — leads to a resource that is mechanically more concerned with the internal variations of the language and rare phenomena than typical parallel corpora. Preliminaries experiments in using this corpus show that it can help improve machine translation for Breton, demonstrating that sourcing data from field linguistic documentation can be a way to help provide NLP tools for minority and low-resource languages.
    • الدخول الالكتروني :
      https://hal.science/hal-04551941
      https://hal.science/hal-04551941v1/document
      https://hal.science/hal-04551941v1/file/ARBRES_LREC_2024.pdf
    • Rights:
      http://creativecommons.org/licenses/by-sa/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.1D48A780