ABSTRACT

Many algorithms, like the ones for classification or clustering, work with structured representations of texts. These representations, especially in the bag-of-words form, suffer from many dimensions and sparsity. This generally decreases the performance of learning and the quality of achieved results. It is therefore desirable to lower the number of features because it is expected that some of the features are irrelevant or redundant. The goal of feature selection is to select a subset of features according to corpus statistics or some other criteria. The features can be ranked and the top k or a minimal set is selected. In a supervised selection, the features should be able to distinguish between the classes of data items. In unsupervised feature selection, the features should help reveal interesting groups in data. Wrapper approaches evaluate a subset of features by an algorithm for which the features are selected, filter approaches select a subset of features independently on an algorithm, and in embedded approaches, the feature selection process is automatically embedded in an algorithm. The chapter discusses some of the commonly used methods, namely chi-squared, mutual information, information gain, term elimination based on entropy, term strength, term contribution, entropy-based ranking, and term variance.