ABSTRACT

This chapter is the main introduction to the use of machine learning algorithms and classifiers. One of the oldest – but still widely successfully used – algorithms, based on the general theorem of mathematician Thomas Bayes, is introduced. The basis is the use of the conditional a-posteriori probability combinations obtained by observing the analyzed event. In the case of the Bayes classifier, the probabilities of occurrences of the selected text elements are used, which are here words (terms) expressed by their frequency. The chapter describes the relations between classes and their members, including the derivation of the Bayes formula, which assigns the most probable class to the element being examined. Given the high computational complexity typical for text documents, the simpler version of Bayes theorem known as Naive Bayes is further shown. Naive Bayes allows the application of Bayes theorem in many practical applications. The principle of calculation is shown on a simple example. The next part demonstrates the use of Naive Bayes implementation in R, this time using real text data.