ABSTRACT

Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania

Active learning (sometimes called “query learning”) is a subfield of machine learning concerned with minimizing annotation and training costs. More precisely, the goal of active learning is to minimize the cost of obtaining labels for data, by selectively interacting with the labeling source. By contrast, the traditional “passive” approach to supervised learning is to acquire a large random sample of training instances to be labeled before any learning begins. The key hypothesis is that if the learning algorithm is allowed to choose the most informative data instances to be labeled for training-to be “curious,” so to speak-it will perform better with less data (and lower costs).