Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Training corpus hr500k 1.0

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Jožef Stefan Institute
    • الموضوع:
      2018
    • Collection:
      Linguistic Data and NLP Tools (CLARIN - Common Language Resources and Technology Infrastructure, Slovenia)
    • نبذة مختصرة :
      The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of the corpus is annotated with semantic role labels. The annotations (and other aspects) of the hr500k corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications for Croatian, https://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, while (4) the semantic role labelling annotation guidelines are currently in the publication process.
    • File Description:
      application/zip; text/plain; charset=utf-8; downloadable_files_count: 3
    • Relation:
      http://www.lrec-conf.org/proceedings/lrec2016/summaries/340.html; http://hdl.handle.net/11356/1792; http://hdl.handle.net/11356/1183
    • الدخول الالكتروني :
      http://hdl.handle.net/11356/1183
    • Rights:
      Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; https://creativecommons.org/licenses/by-sa/4.0/ ; PUB
    • الرقم المعرف:
      edsbas.123B99DE