Item request has been placed!

Item request cannot be made.

Processing Request

CMC training corpus Janes-Syn 1.0

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Arhar Holdt, Å pela; Erjavec, TomaÅ¾; FiÅ¡er, Darja
الموضوع:
computer-mediated communication; tokenisation; dependency treebank; syntactic annotation; manual annotation; TEI
نوع التسجيلة:
text
اللغة:
Slovenian

معلومة اضافية
- بيانات النشر:
  JoÅ¾ef Stefan Institute
- الموضوع:
  2017
- Collection:
  OLAC: Open Language Archives Community
- نبذة مختصرة :
  Janes-Syn is a syntactically annotated corpus of Slovene tweets and is meant as a gold-standard training and testing dataset for syntactic annotation of Slovene computer-mediated communication and for detailed linguistic explorations which require highly accurate and reliable annotations. Words in the dataset are normalised, lemmatised, PoS-tagged and syntactically annotated with the JOS dependency model (http://eng.slovenscina.eu/tehnologije/razclenjevalnik). The annotations on all levels were manually corrected. The corpus creation and structure are described in: ARHAR HOLDT, Å pela, FIÅ ER, Darja, ERJAVEC, TomaÅ¾, KREK, Simon. Syntactic annotation of Slovene CMC : first steps. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities, 27-28 September 2016, Ljubljana, Slovenia, 2016, pp. 3-6. https://nl.ijs.si/janes/cmc-corpora2016/proceedings/ Janes-Syn was created from two larger corpora that are also available in the repository: Janes-Norm (http://hdl.handle.net/11356/1084) and Janes-Tag (http://hdl.handle.net/11356/1123).
- Relation:
  http://hdl.handle.net/11356/1086
- Rights:
  Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; https://creativecommons.org/licenses/by-sa/4.0/
- الرقم المعرف:
  edsbas.90F26CF6

تعليقات

No Comments.