نبذة مختصرة : The so called Kotus word list consists of the words in the1990's Perussanakirja (Basic dictionary of Finnish) and in its original form it is available here: https://kaino.kotus.fi/sanat/nykysuomi/ Here published version of the wordlist of 94 385 lexemes is a modification, that combines information from two sources: UD1 (Universal Dependency Parser) of the Turku NLP group: analysis runs were performed in The Language Bank of Finland Semantic tags based on the UCREL Finnish semantic tag system:https://github.com/UCREL/Multilingual-USAS/tree/master/Finnish with the FiST semantic tagger If the word has been tagged with the semantic tags by FiST, the output looks like this: aakkonen Noun Q3 If the word was not analyzed by FiST, it is given its UD1 analysis and tag Z99: aallokas NOUN§ Case=Nom|Number=Sing Z99 UD1 was able to analyze 39524 of the compounds not analyzed by FiST to constituents. Constituent boundaries are marked with #: aallonpituus aallon#pituus NOUN§ Case=Nom|Number=Sing Z99 Many times the constituent boundaries are right, but there are also missing boundaries and odd analyses. Lexical coverage of FiST with this data is low, 28.68%, due to the fact that the wordlist has about 52 269 compounds. Most of these are not included in the lexicon of FiST. They could, however, many times be analyzed based on their constituents. ; semantic tags
No Comments.