Item request has been placed!
×
Item request cannot be made.
×
Processing Request
NACIONĀLĀ KORPUSU KOLEKCIJA KORPUSS.LV. (Latvian)
Item request has been placed!
×
Item request cannot be made.
×
Processing Request
- معلومة اضافية
- Alternate Title:
Latvian National Corpora Collection Korpuss.lv. (English)
- نبذة مختصرة :
A corpus is a large and structured set of texts, transcribed speech or video recordings intended for linguistic analysis and language technology development. It includes authentic language material that reflects language use. Corpora play an important role in language technologies - both in the development of new methods in many areas of computational linguistics and also in the improvement of existing methods. The analysis of corpora provides objective data, and therefore corpora are very useful in the study of language at its various levels - lexicography and terminology, grammar and semantics, language learning, etc. Latvian National Corpora Collection (LNCC) is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates, essays, etc.), taking into account both quality and size aspects. Currently, 35 text and spoken corpora (total size 2.4 billion tokens) representing different types and genres are available. The corpora are created by the Institute of Mathematics and Computer Science, National Library of Latvia and their partners. Almost all corpora of LNCC are re-annotated with a uniform morpho-syntactic annotation scheme which enables federated search and consistent linguistic analysis in all the LNCC corpora, as well as facilitates selection and mix various corpora for pre-training large Latvian language models. [ABSTRACT FROM AUTHOR]
- نبذة مختصرة :
Copyright of Linguistica Lettica is the property of University of Latvia, Latvian Language Institute and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
No Comments.