## Distributed Optimization via the Alternating Direction Method of Multipliers

**Stephen Boyd**
Problems in areas such as machine learning and dynamic optimization on
a large network lead to extremely large convex optimization problems,
with problem data stored in a decentralized way, and processing
elements distributed across a network. We argue that the alternating
direction method of multipliers is well suited to such problems. The
method was developed in the 1970s, with roots in the 1950s, and is
equivalent or closely related to many other algorithms, such as dual
decomposition, the method of multipliers, Douglas-Rachford splitting,
Spingarn's method of partial inverses, Dykstra's alternating
projections, Bregman iterative algorithms for $\ell_1$ problems,
proximal methods, and others. After briefly surveying the theory and
history of the algorithm, we discuss applications to statistical and
machine learning problems such as the lasso and support vector
machines.

## Lock-Free Approaches to Parallelizing Stochastic Gradient Descent

**Ben Recht**
Stochastic Gradient Descent (SGD) is a very popular optimization algorithm for solving data-driven machine learning problems. SGD is well suited to processing large amounts of data due to its robustness against noise, rapid convergence rates, and predictable memory footprint. Nevertheless, SGD seems to be impeded by many of the classical barriers to scalability: (1) SGD appears to be inherently sequential, (2) SGD assumes uniform sampling from the underlying data set resulting in poor locality, and (3) current approaches to parallelize SGD require performance-destroying, fine-grained communication.

This talk aims to refute the conventional wisdom that SGD inherently suffers from these impediments. Specifically, I will show that SGD can be implemented in parallel with minimal communication, with no locking or synchronization, and with strong spatial locality. I will provide both theoretical and experimental evidence demonstrating the achievement of linear speedups on multicore workstations on several benchmark optimization problems. Finally, I will close with a discussion of a challenging problem raised by our implementations relating arithmetic and geometric means of matrices.

Joint work with Feng Niu, Christopher Re, and Stephen Wright.