ABSTRACT

K-nearest neighbor (KNN) is a very simple algorithm in which each observation is predicted based on its “similarity” to other observations. KNNs have been successful in a large number of business problems and are useful for preprocessing purposes as well. The KNN method is very sensitive to noisy predictors since they cause similar samples to have larger magnitudes and variability in distance values. KNNs are a very simplistic, and intuitive, algorithm that can provide average to decent predictive power, especially when the response is dependent on the local structure of the features. Since KNNs are a lazy learner, they require the model be run at prediction time which limits their use for real-time modeling. Although KNNs rarely provide the best predictive performance, they have many benefits, for example, in feature engineering and in data cleaning and preprocessing.