ABSTRACT

Next to all positive aspects of social networking sites (SNS) data, there are also some pitfalls well known to the experts, the main one being the selection bias. Indeed, the social media accounts/users/data cannot be considered statistically representative of the population of a country: they just represent the population of the users of each given social media. Therefore, any evaluation achievable from the analysis of these data cannot be immediately extended to the whole population.

Adjusting procedures can be applied to account for this bias, in order to extract more representative measures of well-being: e.g., controlling for the penetration rate of each social media, or applying re-sampling strategies to make the social media data appear more similar to a random sample, or mixing social media data and survey data within the framework of bayesian network analysis.

After reviewing quickly these methods, this chapter offers a systematic approach to the selection bias problem, which consists in anchoring the SNS indexes to official statistics through a weighted, space-time, small area estimation model. As a by product, the proposed method also gets a stabilization of social media indicators, which is a welcome property required for official statistics.