Introduction to clustering and classification | ch17

ABSTRACT

This chapter focuses on clustering, dimension reduction, and classification using classification trees. It introduces clustering using base R functions along with the dimension reduction methods from FactoMineR with help from factoextra to graph its outputs. The chapter then uses the rpart package to introduce classification trees, and rpart.plot to plot the outputs. Clustering refers to techniques which identify hidden groupings in data, while classification refers to techniques that can predict a categorical response variable. The chapter introduces cluster analysis using hierarchical clustering. It also introduces dimension reduction to show how this technique can be used to help analyse complex questionnaire data. Dimension reduction is the process of reducing a data set with lots of correlated variables into uncorrelated derived variables while retaining the essence of the original data. Classification is a name for statistical techniques which can predict the category of a response variable based on a set of observations.