197K-Nearest Neighbors (K-NN) | Hands on Data Science for Biologists U

ABSTRACT

The k-nearest algorithm is one of the simplest supervised Machine Learning models, where the predictions are made based on the nearest data points of the instance. Choosing the values of “k” is very crucial for the performance of this algorithm. This chapter will describe the techniques to select the value of k using the elbow method. Missing values is one of the biggest challenges for building a Machine Learning model. The application of K-NN for imputing the missing values for datasets is well established. This chapter will show the implementation of the K-NN classification model on the heart disease dataset using Python’s sklearn library. A dataset of kidney disease will also be analyzed - initially, to show the imputation of missing values using the KNNimputer and consequently the prediction of classes.