Implicitly Constrained Semi-Supervised Linear Discriminant Analysis

In many machine learning tasks, apart from a set of labeled data, a large amount of unlabeled observations is often available. The goal of semi-supervised learning is to use this unlabeled data to improve the supervised classification or regression model that was learned based on the labeled data alone. For classification using linear discriminant analysis (LDA) specifically, several semi-supervised variants have been proposed. Using any one of these methods is, however, not guaranteed to outperform the supervised classifier which does not take the additional unlabeled data into account. They may, in fact, reduce performance.

To counter this problem, [2] introduced moment constrained LDA, which offers a more robust type of semi-supervised LDA. This approach required the identification of specific constraints that link parameter estimates that rely on the labeled data to parameters that do not rely on the labels. Ideally, we would like these constraints to emerge implicitly from the choice of the supervised learning model and a given set of unlabeled objects. 

Implicitly constrained semi-supervised learning, introduced in [3] attempts to do just that. The underlying intuition is that if we could enumerate all possible labelings of the unlabeled data, and train the corresponding classifiers, the classifier based on the true but unknown labels is in this set. This classifier would generally outperform the supervised classifier. In practice, however, we can not enumerate over all possible labelings, nor do we know which one corresponds to the true labeling. One way to know how well any of these classifiers is going to perform is to estimate its performance using the supervised objective function evaluated on labeled objects alone. Based on this objective, it turns out one can efficiently find the optimal classifier in this set of possible classifiers by allowing for soft label assignments to the unlabeled objects. This all leads to a convex optimization problem that can be solved using a simple bounded gradient descent procedure.

We compare the constraint based approaches to other semi-supervised methods, in particular, expectation maximization and self-learning. We also consider the question if and in what sense we can expect improvement in performance over the supervised procedure. The main conclusion from these analyses is that the constraint based approaches are more robust to misspecification of the original supervised model, and may outperform alternatives that make more assumptions on the data, in particular when performance is measured in terms of the log-likelihood of unseen objects.

This work was presented in [1], while the idea of implicitly constrained learning, applied to the least squares classifier, is described in [3].

[1] Krijthe, Jesse H., and Marco Loog. "Implicitly Constrained Semi-Supervised Linear Discriminant Analysis.", 22nd International Conference on Pattern Recognition (ICPR). IEEE, 2014.
[2] Loog, Marco. "Semi-supervised linear discriminant analysis through moment-constraint parameter estimation." Pattern Recognition Letters 37 (2014): 24-31
[3] J. H. Krijthe and M. Loog, “Implicitly Constrained Semi-Supervised Least Squares Classification,” Tech. Rep., 2013.