ABSTRACT

Two common data mining techniques for fi nding hidden patterns in data are clustering and classifi cation analysis. Although classifi cation and clustering are often mentioned in the same breath, they are different analytical approaches. Imaging a database of customer records, where each record represents a customer’s attributes. These can include identifi ers such as name and address, demographic information such as gender and age, and fi nancial attributes such as income and revenue spent. Clustering is an automated process to group related records together. Related records are grouped together on the basis of having similar values for attributes. This approach of segmenting the database via clustering analysis is often used as an exploratory technique because it is not necessary for the analyst to specify ahead of time how records should be related together. In fact, the objective of the analysis is often to discover clusters, and then examine the attributes and values that defi ne the clusters or segments. As such, interesting and surprising ways of grouping customers together can become apparent, and this in turn can be used to drive marketing and promotion strategies to target specifi c types of customers. Classifi cation is a different technique from clustering. It is similar to clustering in that it also segments customer records into distinct segments called classes. But unlike clustering, a classifi cation analysis requires that the analyst know ahead of time how classes are defi ned. For example, classes can be defi ned to represent the likelihood that a customer defaults on a loan (Yes/No). It is necessary that each record in the dataset used to build the classifi er already have a value for the attribute used to defi ne classes. Because each record has a value for the attribute used to defi ne the classes, and because the end-user decides on the attribute to use, classifi cation is much less exploratory than clustering.