Nearest Neighbors | 6 | Text Mining with Machine Learning

ABSTRACT

The Nearest Neighbors chapter demonstrates another possible classification algorithm that uses the principle that a classified element belongs to the same class as its most similar, that is, the nearest neighbor (or the predominant portion of the closest neighbors). It is based on a database of a suitable number of samples, whose membership in their relevant classes is known. For a classified element, the most similar sample (or more such samples) from the database is searched for. The similarity is calculated as the distance. The chapter presents some of the most widely used methods of calculating distance, focusing on the usually functional Euclidean distance. Using a simple example and then demonstrating the application of the Nearest Neighbor method implemented in R to real data, the reader can learn the basics of using the described algorithm and its advantages and possible disadvantages.