Training corpus hr500k 1.0

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Ljubešić, Nikola; Agić, Željko; Klubička, Filip; Batanović, Vuk; Erjavec, Tomaž
المصدر:
https://github.com/nljubesi/hr500k.
الموضوع:
part-of-speech tagging; dependency treebank; parsing; named entities; tokenisation; manual annotation; TEI; semantic role labelling
نوع التسجيلة:
other/unknown material
اللغة:
Croatian

معلومة اضافية
- بيانات النشر:
  Jožef Stefan Institute
- الموضوع:
  2018
- Collection:
  Linguistic Data and NLP Tools (CLARIN - Common Language Resources and Technology Infrastructure, Slovenia)
- نبذة مختصرة :
  The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of the corpus is annotated with semantic role labels. The annotations (and other aspects) of the hr500k corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications for Croatian, https://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, https://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, while (4) the semantic role labelling annotation guidelines are currently in the publication process.
- File Description:
  application/zip; text/plain; charset=utf-8; downloadable_files_count: 3
- Relation:
  http://www.lrec-conf.org/proceedings/lrec2016/summaries/340.html; http://hdl.handle.net/11356/1792; http://hdl.handle.net/11356/1183
- الدخول الالكتروني :
  http://hdl.handle.net/11356/1183
- Rights:
  Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; https://creativecommons.org/licenses/by-sa/4.0/ ; PUB
- الرقم المعرف:
  edsbas.123B99DE

تعليقات

No Comments.

Training corpus hr500k 1.0

اتصل بنا

اتبع