ABSTRACT

In recent years, there has been a noticeable incremental increase in the number of available Big Data infrastructures. This increase has promoted the adaptation of traditional machine learning techniques in order to be able to address large-scale problems and to adapt solutions to be launched in distributed environments. These platforms provide scalability, fault tolerance, and highly intuitive programming languages to develop software, but they need to train algorithms that are efficient in terms of computational time and communication. For these reasons, linear models are among the most common predictive modeling techniques in working with Big Data. However, these methods show poor performance when there is a nonlinear relationship between the features and the variables being predicted. To solve this limitation, one alternative is to extract features based on the original ones in order to solve problems that are nonlinear in the original feature space. In this chapter, we explain some feature extraction techniques capable of working in distributed Big Data platforms.