Handling Missing Data in Large Databases

doi:10.4324/9781003025245-7

ABSTRACT

Missing data often occur even in carefully planned studies. Hence, this chapter deals with methods for handling missing values with an emphasis on large data sets. We avoid the term ‘big data’ because its use is ambiguous and appears in rather different contexts, from more or less clearly defined and, with standard techniques and equipment, manageable data situations up to data sets with a huge number of units and/or variables requiring up to multiples of petabytes or more of storage space and specific processing technology. Even more broadly, the term ‘big data’ often denotes not only the data set itself but also the whole process from data collection and editing to analysis and interpretation (e.g. Gandomi & Haider, 2015).