ABSTRACT

The advent of Big Data together with the challenges it poses and the effect on recommendation systems are the main topics of the present chapter. Data-preprocessing techniques including data cleaning, data integration, data reduction, and data transformation should be applied to remove noise and inconsistencies. A latent factor based model characterizes both users and items by a low number of factors inferred from the ratings pattern. There are several algorithms to compute matrix factorizations. Among the most known are ALS (alternating least-squares) and SGD, alongside a variety of algorithms combining the two models and improving on efficiency, stability, and scalability such as CCD++. MapReduce was introduced in 2004 by Google and paved the road for Hadoop, which has played a significant role in the Big Data era. The MLlib [62] covers the same range of learning categories as Mahout and adds regression models.