Machine Learning Methods for Computational Social Science

doi:10.4324/9781003025245-21

ABSTRACT

The social science researcher is often faced with the problem of trying to predict an individual's (or group's) behavior from a collection of other measured variables of their state. For example, given a person's age, gender, income, occupation, location, and education, who are they most likely to vote for in the next election? When the outcome, as here, is a category – the possible candidates they will vote for – we refer to it as a classification problem. We might also want to predict a quantitative outcome; for example, the amount of money the same person would give to their candidate of choice given the same input variables. In this case we call the problem a regression problem. The input variables (e.g., occupation and education) are referred to across various disciplines as independent variables, covariates, predictors, or attributes; the outcome variable (e.g., candidate or donation amount) is referred to as the response, dependent variable, or label. ¹ The cases that couple the input variables with an associated outcome are referred to as observations, individuals, instances, or examples. ²