ABSTRACT

Ambiguity is one of the fundamental properties of the lexical system of language. Lexical ambiguity manifests itself in homonymy and polysemy, which are usually hard to differentiate. Word sense frequencies and their distributions are not easy to estimate. The corpus is a source of contexts, and its choice may influence sense frequency, because word sense distributions - as well as predominant senses - vary from corpus to corpus. Distributed vector representations is a way of representing words as low-dimensional dense real-valued vectors. Sense frequency data for a large number of Russian nouns are an interesting dataset for testing theories of sense frequency distributions. The ordering of senses in dictionaries in the Russian lexicographic tradition generally follows etymological principles, that is, the first sense of a polysemous word usually is the original, non-figurative meaning, which does not always correspond to the most common sense in contemporary language.