Minimal Complexity Machines: Learning Machines with Minimal VC dimension

The Vapnik Chervonenkis (VC) dimension measures the capacity of a learning machine, and computational learning theory shows that a small VC dimension can lead to good generalization and robust learning. It is well known that SVMs can have a large, even infinite VC dimension. Despite being amongst the most popular of methods, SVMs do not provide any guarantee of good generalization.

Classifiers
The relation between the VC dimension and the defining variables of a classifier is usually abstruse and rarely tractable. In recent works (http://www.sciencedirect.com/science/article/pii/S0925231214010194http://arxiv.org/abs/1408.2803), we have shown how to obtain an exact bound on the VC dimension of a fat margin hyperplane classifier. The proposed exact bound is a function that bounds the VC dimension from both above and below. It is a convex, smooth, and differentiable function. Minimizing the exact bound yields a classifier with a small VC dimension. We show that this can be done by solving a linear programming problem. The Minimal Complexity Machine (MCM) finds a hyperplane classifier with a small VC dimension, that also learns the training samples with small error. To the best of our knowledge, the MCM is the first, and only such approach that provides a way of learning a classifier with small VC dimension. On many benchmark datasets from the UCI repository, the MCM generalizes better than SVMs, while using less than one-tenth the number of support vectors. The code can be downloaded here.

Function Approximation
Regression, or Function Approximation is another common problem in machine learning. We have shown in the paper (http://www.sciencedirect.com/science/article/pii/S092523121500939X) how the MCM can be used to find regressors that have smaller mean squared errors on test samples, while using far fewer support vectors than SVMs. The code for the MCMR can be downloaded here.

Feature Selection
The number of features or input variables used by a linear hyperplane is one less than an upper bound on its VC dimension. The primal version of the linear MCM allows us to find a set of features that constitutes a minimal discriminative set. The MCM thus provides a way for selecting an optimal set of features. Results on benchmark datasets(http://arxiv.org/abs/1410.7372) indicates that the MCM provides a competitive feature selection approach.

Fuzzy MCM Classifiers
Using fuzzy memberships facilitates robust learning by reducing the effect of outliers and noise in the data. An extension of the MCM to the fuzzy domain(http://arxiv.org/abs/1501.02432) demonstrates that the use of fuzzy memberships indeed improves generalization and can also yield sparser solutions.

Feature Selection for Hyperspectral Image data
We have successfully applied the recently proposed filter feature-selection algorithm based on minimizing a tight bound on the VC dimension, to determine a reasonable subset of bands without any user-defined stopping criteria on widely used hyperspectral images and demonstrate that this method outperforms state-of-the-art methods in terms of both sparsity of feature set as well as accuracy of classification(http://arxiv.org/abs/1509.08112).