In the talk, we focus on algorithms for solving well-structured large-scale convex programs in the case where huge problem's sizes prevent processing it by polynomial time algorithms and thus make computationally cheap first order optimization methods the methods of choice. We overview significant recent progress in utilizing problem's structure within the first order framework, with emphasis on algorithms with dimension-independent (and optimal in the large-scale case) iteration complexity $O(1/\epsilon)$, $\epsilon$ being the target accuracy. We then discuss the possibility to further accelerate the first order algorithms by randomization, specifically, by passing from expensive in the extremely large scale case precise deterministic first order oracles to their computationally cheap stochastic counterparts. Applications to be discussed include SVM's, $\ell_1$ minimization, testing sensing matrices for ``goodness' in the Compressed Sensing context, low-dimensional approximation of high-dimensional samples, and some others.
Chordal graphs play a fundamental role in algorithms for sparse matrix factorization, graphical models, and matrix completion problems. In matrix optimization chordal sparsity patterns can be exploited in fast algorithms for evaluating the logarithmic barrier function of the cone of positive definite matrices with a given sparsity pattern and of the corresponding dual cone. We will give a survey of chordal sparse matrix methods and discuss two applications in more detail: linear optimization with sparse matrix cone constraints, and the approximate solution of dense quadratic programs arising in support vector machine training.
A common modern approach to learning is to phrase training as an empirical optimization problem, and then invoke an optimization procedure in order to carry out the training. But often, the best optimization approach for the problem, from a machine learning perspective, turns out to be a simple local update rule that was around long before the learning problem was cast as a global convex optimization problem. I will discuss two such cases in detail: stochastic gradient descent for regularized linear learning, and non-convex optimization of a factorization for trace-norm regularization. In such cases what did we gain from casting the problem as a global optimization problem and making sure it is convex? I will argue we still gain from the global optimization view, as long as we remember that ultimately our objective is a learning objective.