Preprocessing in Big Data: New challenges for discretization and feature selection

doi:10.1201/9781315156408-10

Chapter

Preprocessing in Big Data: New challenges for discretization and feature selection

ABSTRACT

This chapter focuses on two topics: discretization and feature selection (FS). The chapter talks about "The advent of Big Data" that briefly introduces the concept of Big Data and how it opens important challenges for machine learning researchers. However, there are still an important number of emerging challenges that researchers need to deal with. Stability is an important measure when evaluating the adequacy of a FS algorithm. The chapter explains the open challenges that Big Data brings, centered in FS and discretization under the section "Challenges". It presents different case studies, specifically, a parallel implementation of the minimum description length (MDL)-based discretizer and a redesign of the mRMR (minimum redundancy maximum relevance) algorithm for its use in different parallel platforms. There is no doubt that the explosion in data dimensionality points to a number of hot spots for machine learning researchers to launch new lines of research.