OPT 2010: 3rd International Workshop on Optimization for Machine Learning
Accepted Papers
-
Information-theoretic lower bounds on the oracle complexity of sparse convex optimization
Alekh Agarwal, Peter Bartlett, Pradeep Ravikumar, Martin Wainwright
Relative to the large literature on upper bounds on complexity of convex
optimization, lesser attention has been paid to the fundamental hardness of
these problems. Recent years have seen a surge in optimization methods
tailored to sparse optimization problems. In this paper, we study the
complexity of stochastic convex optimization in an oracle model of
computation, when the objective is optimized at a sparse vector in a high
dimensional space. Our result is matched by an appropriately tuned method of
mirror descent, establishing the minimiax optimality of the result.
-
An efficient algorithm for sparse PCA
Yunlong He, Renato Monteiro, Haesun Park
Sparse principal component analysis(PCA) imposes extra constraints or penalty
terms to the standard PCA to achieve sparsity. In this paper, we introduce an
efficient algorithm for finding a single sparse principal component (PC) with
a specified cardinality. The algorithm consists of two stages. In the first
stage, it identifies an active index set with a desired cardinality
corresponding to the nonzero entries of the PC. In the second one, it uses the
power iteration method to find the best direction with respect to the active
index set. Experiments on randomly generated data and real-world datasets show
that our algorithm is very fast, especially on large and sparse data sets,
while the numerical quality of the solution is comparable to the state-of-art
algorithm.
-
Hierarchical Classification via Orthogonal Transfer
Dengyong Zhou, Lin Xiao, Mingrui Wu
We consider multiclass classification problems in which the set of labels are
organized hierarchically as a category tree, and the examples are classified
recursively from the root to the leaves. We propose a hierarchical
support-vector-machine that encourages the classifiers at each node of the
tree to be different from the classifiers at its ancestors. More specifically,
we introduce regularizations that force the normal vector of the classifying
hyperplane at each node of the tree to be orthogonal to those at its ancestors
as much as possible. We establish sufficient conditions under which such an
objective is a convex function of the normal vectors. We also present an
efficient dual-averaging method for solving the resulting nonsmooth convex
optimization problem. We evaluate the method on a number of real-world text
categorization tasks and obtain state-of-the-art performance.
-
An Incremental Subgradient Algorithm for Approximate MAP Estimation in
Graphical Models
Jeremy Jancsary, Gerald Matz, Harald Trost
We present an incremental subgradient algorithm for approximate computation of
maximum-a-posteriori (MAP) states in cyclic graphical models. Its most
striking property is its immense simplicity: each iteration requires only the
solution of a sequence of trivial optimization problems. The algorithm can be
equally understood as a degenerated dual decomposition scheme or as
minimization of a degenerated tree-reweighted upper bound and assumes a form
that is reminiscent of message-passing. Despite (or due to) its conceptual
simplicity, it is equipped with important theoretical guarantees and exposes
strong empirical performance.
-
Augmenting Dual Decomposition for MAP Inference
Andre Martins, Mario Figueiredo, Noah Smith, Pedro Aguiar, Eric Xing
In this paper, we propose combining augmented Lagrangian optimization with the
dual decomposition method to obtain a fast algorithm for approximate MAP
(maximum a posteriori) inference on factor graphs. We also show how the
proposed algorithm can efficiently handle problems with (possibly global)
structural constraints. The experimental results reported testify for the
state-of-the-art performance of the proposed approach.
-
Statistical Linearization for Value Function Approximation in Reinforcement
Learning
Matthieu Geist
Reinforcement learning (RL) is a machine learning answer to the optimal
control problem. It consists in learning an optimal control policy through
interactions with the system to be controlled, the quality of this policy
being quantified by the so-called value function. An important RL subtopic is
to approximate this function when the system is too large for an exact
representation. This paper presents statistical-linearization-based
approaches to estimate such functions. Compared to more classical approaches,
this allows considering nonlinear parameterizations as well as the Bellman
optimality operator, which induces some differentiability problems. Moreover,
the statistical point of view adopted here allows considering colored
observation noise models instead of the classical white one; in RL, this can
provide useful.
-
Gradient-type methods for primal SVM model selection
Gregory Moore, Charles Bergeron, Kristin Bennett
Selection of multiple SVM model hyperparameters by cross-validation can be
expressed as a bilevel optimization problem. Explicit methods optimize the
model parameters and hyperparameters simultaneously. In implicit methods, the
model parameters are considered to be implicit functions of the
hyperparameters. Recently, gradient-based methods have emerged as an efficient
way to optimize the hyperparameters. In this work, we examine both implicit
and explicit model selection algorithms for linear SVM-type machine learning
models expressed as nonsmooth unconstrained optimization problems. A key point
is that the underlying optimization problems are nonsmooth and nonconvex, so
set-valued subgradients must be used. Nonsmooth nonconvex optimization
techniques can lead to scalable model selection algorithms but appropriate
choice and use of subgradients is essential for good performance. A new
nonconvex implicit bundle method is developed and compared computationally to
recent nonsmooth implicit and explicit gradient methods and grid search. All
of the gradient methods out perform grid search. The subgradients calculated
using the simple implicit method may not yield good directions, leading to
algorithm failures. Smaller datasets can benefit from the implicit bundle
algorithms that have specialized strategies for calculating effective
subgradients. The well-founded explicit method consistently provides robust
solutions for all size problems but at greater computational expense for
larger problems. Problems with massive training data can be frequently solved
with the simplified implicit subgradient algorithm.
-
An Optimization Based Framework for Dynamic Batch Mode Active Learning
Shayok Chakraborty, Vineeth Balasubramanian, Sethuraman Panchanathan
Active learning techniques have gained popularity in reducing human effort to
annotate data instances for inducing a classifier. When faced with large
quantities of unlabeled data, such algorithms automatically select the salient
and representative samples for manual annotation. Batch mode active learning
schemes have been recently proposed to select a batch of data instances
simultaneously, rather than updating the classifier after every single query.
While numerical optimization strategies seem a natural choice to address this
problem (by selecting a batch of points to ensure that a given objective
criterion is optimized), many of the proposed approaches are based on greedy
heuristics. Also, all the existing work on batch mode active learning assume
that the batch size is given as an input to the problem. In this work, we
propose a novel optimization based strategy to dynamically decide the batch
size as well as the specific points to be queried, based on the particular
data stream in question. Our results on the widely used VidTIMIT and the MBGC
biometric datasets corroborate the efficacy of the framework to adaptively
identify the batch size and the particular data points to be selected for
manual annotation, in any batch mode active learning application.