ABSTRACT

Suvrit Sra Max Planck Institute for Intelligent Systems and Carnegie Mellon University

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 The Nips Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Scaling Up: Incremental Proximal Splitting . . . . . . . . . . . . . . . . . . . . . 91

4.3.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4 Application to Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.1 Introduction In this chapter we study large-scale nonconvex optimization problems with

composite objective functions that are composed of a differentiable possibly nonconvex cost and a nonsmooth but convex regularizer. More precisely, we consider optimization problems of the form

minimize Φ(x) := f(x) + r(x), s.t. x ∈ X , (4.1)

where X ⊂ Rn is a compact convex set, f : Rn → R is a differentiable cost function, and r : Rn → R is a closed convex function. Further, we assume that the gradient ∇f is Lipschitz continuous on X (denoted f ∈ C1L(X )), i.e.,

∃L > 0 s.t. ‖∇f(x)−∇f(y)‖ ≤ L ‖x− y‖ for all x, y ∈ X . (4.2)

Throughout this chapter, ‖·‖ denotes the standard Euclidean norm. Problem (4.1) generalizes the more thoroughly studied class of composite

convex optimization problems [30], a class that has witnessed huge interest in machine learning, signal processing, statistics, and other related areas. We refer the interested reader to [21, 2, 3, 37] for several convex examples and

Vector

recent references. A thread common to existing algorithms for solving composite problems is the remarkably fruitful idea of proximal-splitting [9]. Here, nonsmoothness is handled via proximity operators [29], which allows one to treat the nonsmooth objective f + r essentially as a smooth one.