ABSTRACT

This chapter discusses several challenges brought by Big Data, high-dimensionality small sample size (HDSSS) problems, multi-label data, and privacy preserving. It introduces some current applications of feature selection, such as bioinformatics, social media and multimedia retrieval. The chapter also discusses some general issues and future work for feature selection. It also introduces existing algorithms according to the two frameworks and shows how feature selection works in each framework and what their strengths and weaknesses are. The correlation-based feature selection framework, which consists of two steps: relevance analysis determines the subset of relevant features, and redundancy analysis determines and eliminates the redundant features from relevant ones to produce the final subset. Besides search-based feature selection, another important framework for feature selection is based on the correlation analysis between features and classes. Sparsity-based feature selection is an efficient tool to select features from HDSSS data.