ABSTRACT

As everyone knows, some commercial composition automatic marking tools in English, such as Project Essay Grade (PEG), Intelligent Essay Assessor (IEA), Electronic Essay Rater (E-rater), Select-a-Kibitzer, Automatic Essay Assessor (AEA) have been put into practice in the most recent decades [1]. However, these tools are not applicable to analyze semantic contents of Chinese student’s compositions because the methods are for native learners, and may not be appropriate for Chinese student’s compositions. In composition analyzing tools in English, there are three well-known topic semantic level methods, namely Latent Semantic Analysis (LSA) and related statistical methods, Probabilistic LSA (PLSA), and Latent Dirichlet Allocation (LDA), are commonly used [7]. LSA has produced promising results in semantic analysis of compositions, and has been successfully applied to IEA, Select-a-Kibitzer for assessing compositions written in English [7]. But as the extension of traditional Vector Space Model (VSM), we also obtain the word-document matrix after singular value decomposition (SVD) and other processing. In this case, the more repetition of words between the two compositions, the more similar the two compositions are. It is not a good solution for the phenomenon of polysemy and synonymy, that is to say the model build by LSA is not interpretable for semantic analysis. Besides, computational overhead and efficiency of LSA approach based on singular value decomposition (SVD) cannot be guaranteed [5]. So, LSA is not suitable for Chinese student’s compositions for topic semantic analysis.