In this talk we consider a new class of convex optimization problems, which admit faster black-box optimization schemes. For analyzing their rate of convergence, we introduce a notion of mixed accuracy of an approximate solution, which is a convenient generalization of the absolute and relative accuracies. We show that for our problem class, a natural Quasi-Newton method is always faster than the standard gradient method. At the same time, after an appropriate normalization, our results can be extended onto the general convex unconstrained minimization problems.
We describe methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a L1-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are guaranteed to be absent after solving the learning problem. Our framework applies to a large class of problems, including support vector machine classification, logistic regression and least-squares. The complexity of the feature elimination step is negligible compared to the typical computational effort involved in the sparse supervised learning problem: it grows linearly with the number of features times the number of examples, with much better count if data is sparse. We apply our method to data sets arising in text classification and observe a dramatic reduction of the dimensionality, hence in computational effort required to solve the learning problem, especially when very sparse classifiers are sought. Our method allows to immediately extend the scope of existing algorithms, allowing us to run them on data sets of sizes that were out of their reach before.
Limited-memory quasi-Newton and Hessian-free Newton methods are two workhorses of unconstrained optimization of high-dimensional smooth objectives. However, in many cases we would like to optimize a high-dimensional unconstrained objective function that is non-smooth due to the presence of a `simple' non-smooth regularization term. Motivated by problems arising in estimating sparse graphical models, in this talk we focus on strategies for extending limited-memory quasi-Newton and Hessian-free Newton methods for unconstrained optimization to this scenario. We first consider two-metric (sub-)gradient projection methods for problems where the regularizer is separable, and then consider proximal Newton-like methods for group-separable and non-separable regularizers. We will discuss several applications where sparsity-encouraging regularizers are used to estimate graphical model parameters and/or structure, including the estimation of sparse, blockwise-sparse, and structured-sparse models.