ABSTRACT

We have seen several iterative methods for semisupervised learning: selftraining and co-training, McLachlan’s algorithm, the EM algorithm, boosting. In section 7.6, we showed that the iterations in self-training and co-training can be viewed as propagating label information through a graph. EM provides a soft version, and the Gibbs sampler can be seen in similar light: each update probabilistically adjusts the label of a node to make it like the labels of its neighbors. Viewed abstractly enough, just about any classification algorithm can be

seen as propagating labels based on similarity. The nearest-neighbor classifier asks directly which training instance a new instance is most similar to, and predicts the label of that most-similar training instance. Other classifiers create more abstract representations of classes and compare a given test instance to the abstract representations, predicting the class whose representation the new instance is most similar to. An abstract class representation, in turn, is based on the features that training instances belonging to the class have in common. For example, as we have seen, many classifiers construct a weight vector

w and, given a new instance x, predict positive if x ·w exceeds a threshold and negative otherwise. One can see w as an abstract representation of the positive class, and the dot product x ·w as the measure of similarity between x and w. A similarity relation defines a weighted graph. The objects that participate

in the relation represent nodes in the graph, and there is an edge between two objects just in case they have non-zero similarity. The weight on an edge represents the similarity between the two objects connected by the edge. We have mentioned two different sorts of graph. In one, nodes represent

instances, and features play a role in determining the similarities that are represented as edge weights. In the other sort of graph, both instances and features are represented by nodes. An instance is “similar” to a feature if it possesses the feature. We are particularly interested in the semisupervised case in which some,

but not all, of the nodes are labeled. If the seed consists of labeled data, then the labeled nodes are instances. If the seed consists of a seed classifier, the labeled nodes are features. The labels of nodes corresponding to the seed are fixed, and other labels are allowed to vary. The basic mechanism for assigning labels to unlabeled nodes is to propagate labels from neighboring nodes, in

accordance with the weight of the edge connecting the nodes. In this chapter, we examine the idea of label propagation through a graph in

more detail. The picture that emerges is of a graph as a fabric or net, in which instances and features are knots connected by cords to similiar instances and features. The height of the fabric corresponds to the probability of having the positive label: an altitude of 1 indicates that the node is certainly labeled positive, an altitude of 0 indicates that the node is certainly labeled negative, and an altitude of 1/2 indicates that the two labels are equally likely. The labeled instances form the perimeter, at fixed heights of 1 or 0, and the unlabeled instances form the interior. The overall effect is a complex interpolation, with each unlabeled node being pulled to the average height of its neighbors. This yields a labeling of the entire graph, with nodes above altitude 1/2 labeled positive in the end and nodes below altitude 1/2 labeled negative.