ABSTRACT

This chapter implements four algorithm cases with R, including the collaborative filtering, the PageRank, the moving average (MA) model, and the genetic algorithm. It explains the Mahout implementations using R. The R code implementations follow the design logics of Mahout's source code. The function to create the data model, FileDataModel, is mainly used to read data from CSV file and load the data to memory with the type of matrix in R. There are various algorithms to calculate the user similarity, such as Euclidean distance similarity algorithm, Pearson similarity algorithm, the cosine similarity algorithm, Spearman rank correlative coefficient similarity algorithm, logarithmic likelihood similarity algorithm, and so on. PageRank is an algorithm that is exclusively owned by Google. It is used to measure the specific page's importance to other pages in the search engine index. The algorithm was innovated by Larry Page and Sergey Brin in the late 1990s.