ABSTRACT

The k-means type of clustering algorithms [13, 16] are widely used in realworld applications such as marketing research [12] and data mining due to their efficiency in processing large datasets. One unavoidable task of using k-means in real applications is to determine a set of features (or attributes). A common practice is to select features based on business domain knowledge and data exploration. This manual approach is difficult to use, time consuming, and frequently cannot make a right selection. An automated method is needed to solve the feature selection problem in k-means.