ABSTRACT

With agent-based models, computational social scientists want to explain the emergence of social phenomena, such as ethnic and social segregation, opinion polarization, the rise of mass protests, economic and political cycles, or the ups and downs of epidemic spreading of infectious diseases. While modeling, researchers may experiment with theories and assumptions about the behavior of individuals and link them to the societal level through computer simulation of repeated social interaction of many artificial individuals. Similarly, researchers play with institutional settings modifying the context in which agents interact. Such agent-based models can be seen as a tool to explore multi-player games where actors’ rationality is bounded. In both senses, agent-based modeling is intrinsically theory driven.

The strength of agent-based modeling is the quantitative causal understanding of emergence through complex nonlinear dynamics on the macro-level. Therefore, methods of validation with empirical data are less easily specified and more diverse than in variable-based regression models. In particular, the nature of the data–theory link is closely related to the main research purpose and can deviate from validation. Nevertheless, agent-based models are to explain real-world phenomena, and thus a useful model should have a reflection in empirical data and the other way round.

This chapter is about how agent-based modeling can be data-driven. This includes (1) the consideration of data structure in the model building process, (2) a parallel data exploration searching and quantifying “stylized facts” to later use for validation, (3) the iteration of model revisions to increase its replicative validity, and (4) the calibration of model parameters with data for forecasting or counterfactual simulations.

These aspects of data-driven agent-based modeling will be exemplified by two examples about segregation and polarization.