ABSTRACT

In this chapter, the authors discuss distributed machine learning with an example of regularized regression and its optimization algorithms, such as proximal gradient descent and coordinate descent algorithms. In particular, under a parameter server architecture, they review various consistency models, including bulk-synchronous parallel, asynchronous parallel, and stale-synchronous parallel, as well as data-, model-, and data-model-parallel approaches for distributed optimization. The authors categorize the large-scale machine learning problems into two types: big data and big models. Machine learning has been widely used for network data. Examples include latent Dirichlet allocation for social network analysis, graphical lasso, neighborhood selection, and penalized logistic regression for estimation of gene–gene interaction networks. In distributed machine learning, one should consider how to parallelize computations on multiple workers and how to synchronize parameters between different workers, among others. Workers refer to processes or threads in distributed settings. The authors assume a parameter server which stores global parameters shared across multiple workers in a distributed fashion.