ABSTRACT

This chapter examines several types of supervised and unsupervised machine learning approaches to address the high utilizer problem in health care. Regression is the most widely used supervised method in statistical and predictive modeling, and it serves as the base risk-adjustment model for modeling risk-based payment systems in health care. Research and industry have consistently shown that machine learning approaches are effective at analyzing large amounts of data and using results to make predictions. Supervised learning attempts to “learn” a function to predict output given input based on existing input and output pairs. Regularized regression, also known as the least absolute shrinkage and selection operator, fits a regular linear regression model, but penalizes solutions with a large number of nonzero coefficients at the same time. Gradient boosting is another set of successful supervised machine learning techniques that can handle high-dimensional input variables.