ABSTRACT

In the previous chapter we introduced self-training and co-training as the most widely used semisupervised learning algorithms in computational linguistics. We present them in more detail in this chapter. Both algorithms are “naive” in the sense that they did not derive from a

theory of semisupervised learning; rather, they are embodiments of simple algorithmic ideas that can be grasped without much background. This is not to say that they lack theoretical justification. Co-training and self-training can both be justified on the basis of a conditional independence assumption that is closely related to the independence assumption underlying the Naive Bayes classifier. And forms of the algorithms have arisen on the basis of a theory of semisupervised learning – particular examples are McLachlan’s version of self-training (discussed in chapter 8) and de Sa’s version of co-training (discussed in chapter 9). These theoretically motivated algorithms even predate the versions that are well known in computational linguistics. But we focus here on the familiar versions, whose primary attraction is simplicity and intuitiveness, rather than theoretical underpinnings.