ABSTRACT

Discussion of a learning algorithm may start only after texts or images have already been reduced to lists of “features”. This chapter focuses on the plasticity and simplicity of the data models used by learning algorithms because this has been a major source of misunderstanding when humanists encounter algorithmic methods. In supervised learning, a researcher creates a set of examples that the algorithm takes as ground truth in its effort to model a pattern. The goal of algorithmic modeling is to translate that uninformative representation, provisionally, into something more meaningful. Instead of beginning with a set of target categories, unsupervised learning algorithms are designed to infer structure latent in the data itself. Learning algorithms address the humanistic problem by permitting researchers to start with a relatively unstructured data model that may include hundreds or thousands of variables, each separately uninformative.