ABSTRACT

Onemajor task ofmachine learning, pattern recognition anddatamining is to construct good models from data sets.

A “data set” generally consists of feature vectors, where each feature vector is a description of an object by using a set of features. For example, take a look at the synthetic three-Gaussians data set as shown in Figure 1.1. Here, each object is a data point described by the features x-coordinate, ycoordinate and shape, and a feature vector looks like (.5, .8, cross) or (.4, .5, circle). The number of features of a data set is called dimension ordimensionality; for example, the dimensionality of the abovedata set is three. Features are also called attributes, a feature vector is also called an instance, and sometimes a data set is called a sample.