نبذة مختصرة : This is a collection of n-grams extracted from the Gos corpus of spoken Slovene. http://hdl.handle.net/11356/1040. In addition to the separate lists of n-grams for tokens and their attributes (normalized form, morphosyntacic tag, lemma), an adjusted frequency list with statistical substring reduction has also been added (as described in O'Donnell 2011). Only n-grams within sentences have been counted.
No Comments.