Feature Selection for Clustering: A Review

doi:10.1201/9781315373515-2

ABSTRACT

Dimensionality reduction techniques can be categorized mainly into feature extraction and feature selection. In the feature extraction approach, features are projected into a new space with lower dimensionality. Feature selection is broadly categorized into four models: filter model, wrapper model, embedded model, and hybrid model. With the existence of a large number of features, learning models tend to overfit and their learning performance degenerates. Feature selection is one of the most used techniques to reduce dimensionality among practitioners. The existence of irrelevant features in the data set may degrade learning quality and consume more memory and computational time that could be saved if these features were removed. However, finding clusters in high-dimensional space is computationally expensive and may degrade the learning performance. Clustering is useful in several machine learning and data mining tasks including image segmentation, information retrieval, pattern recognition, pattern classification, and network analysis.