ABSTRACT

Word sense disambiguation is essential for natural language processing applications such as machine translation, answering question, and natural language interface. The purpose is that the performance of such applications depends on senses of lexicons. A new word sense disambiguation algorithm is proposed for simple semantic units based on dynamic programming, in which the semantic compution model based on semantic relevancy is given, the definition of simple semantic units is given, and characteristics of simple semantic units are analyzed (Liu 2014). A machine learning method is proposed and features are designed to recognize metaphors, whose purpose is to make a comparison with a word sense disambiguation task. Experimental results show that the method achieves much higher accuracy (Jia 2014). A composite kernel is presented, which is a linear combination of two types of kernels (Wang 2014). They are respectively bag of words kernel and sequence kernel for word sense disambiguation. Its purpose is to integrate heterogeneous sources of information in a simple and eective way. Experiments show that the composite kernel can consistently improve the performance of WSD. Main existing methods for word sense disambiguation problems are investigated and a genetic algorithm is proposed to solve it. At the

same time, it is applied to modern standard Arabic. Its performance is evaluated on a large corpus. The prediction of the genetic algorithm is improved (Menai 2014). A novel approach for word sense disambiguation in English to Thai is proposed. The approach generates a knowledge base in which information of local context is stored. Then, this information is applied to analyze probabilities of several meanings of a target word (Keandoungchun 2013). WSD methods based on machine learning techniques with lexical features can suer from the problem of sparse data. A hybrid approach is given which copes separately with an error-prone data due to sparsity, in which a data is regarded as error-prone if its nearest neighbors are relatively distant and their senses are uniformly distributed (Han 2013). The connection between lexical chains and word sense disambiguation is investigated. A system that extracts words from unstructured text is given in which sets of lexical chains are provided. At the same time, the process of disambiguation is implemented based on WordNet’s synsets (Dumitrescu 2011). The coreference resolution technique is incorporated into a word sense disambiguation system for improving disambiguation precision. With the help of coreference resolution, the related candidate dependency graphs at the candidate level and the related instance graph patterns at the instance level in instance knowledge network are

connected together (Hu 2011). A novel WSD model based on distances between words is proposed, which is built on the basic of traditional graph WSD model (Yang 2012). The method can make full use of distance information. A distributional thesaurus which is computed from a large parsed corpus is used for lexical expansion of context and sense information. It bridges the lexical gap that is seen as the major obstacle for word overlap-based approaches (Miller 2012).