ABSTRACT

This chapter addresses two common problems that arise when using data from different sources. The first of them is to join datasets, a process that is often addressed by using one or identification variables (for example, the name of a country, the official code of a subnational unit, and a code). In some cases, when the identification variables are well standardized, the process becomes simple. For example, in official governmental datasets, countries have a unique code. In all of them Brazil will be “BRA”, and Chile will be “CHI”, which makes the work in R easy. Most of the coding reduces substantially the names of the countries to a handful of characters and/or numbers. The naniar package contains visual tools for helping us to better understand the possible patterns in our missing values. It can also be interesting to explore the relationship between the missing values of a variable and a numeric column of the dataset, instead of a categorical.