Advanced Topics | 10 | 8.1.1 Usefulness of Unlabeled Data

ABSTRACT

The great advances in data collection and storage technology enable the accumulation of a large amount of data in many real-world applications. Assigning labels to these data, however, is expensive because the labeling process requires human efforts and expertise. For example, in computeraided medical diagnosis, a large number of x-ray images can be obtained from routine examination, yet it is difficult to ask physicians to mark all focuses of infection in all images. If we use traditional supervised learning techniques to construct a diagnosis system, then only a small portion of training data, on which the focuses have been marked, are useful. Due to the limited amount of labeled training examples, it may be difficult to attain a strong diagnosis system. Thus, a question naturally arises: Can we leverage the abundant unlabeled data with a few labeled training examples to construct a strong learning system?