Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

CMC training corpus Janes-Syn 1.0

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Jožef Stefan Institute
    • الموضوع:
      2017
    • Collection:
      OLAC: Open Language Archives Community
    • نبذة مختصرة :
      Janes-Syn is a syntactically annotated corpus of Slovene tweets and is meant as a gold-standard training and testing dataset for syntactic annotation of Slovene computer-mediated communication and for detailed linguistic explorations which require highly accurate and reliable annotations. Words in the dataset are normalised, lemmatised, PoS-tagged and syntactically annotated with the JOS dependency model (http://eng.slovenscina.eu/tehnologije/razclenjevalnik). The annotations on all levels were manually corrected. The corpus creation and structure are described in: ARHAR HOLDT, Špela, FIŠER, Darja, ERJAVEC, Tomaž, KREK, Simon. Syntactic annotation of Slovene CMC : first steps. Proceedings of the 4th Conference on CMC and Social Media Corpora for the Humanities, 27-28 September 2016, Ljubljana, Slovenia, 2016, pp. 3-6. https://nl.ijs.si/janes/cmc-corpora2016/proceedings/ Janes-Syn was created from two larger corpora that are also available in the repository: Janes-Norm (http://hdl.handle.net/11356/1084) and Janes-Tag (http://hdl.handle.net/11356/1123).
    • Relation:
      http://hdl.handle.net/11356/1086
    • Rights:
      Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; https://creativecommons.org/licenses/by-sa/4.0/
    • الرقم المعرف:
      edsbas.90F26CF6