ABSTRACT

This chapter discusses a range of tools and strategies for identifying and correcting misspellings, inconsistencies and other errors in data. Those tools include Excel (which has a special copy-down command to fill in values that are missing but implied); text editors (to see and delete hidden characters, such as spaces before or after numbers, that may interfere with data analysis); and OpenRefine (to standardize spellings and to reshape data by turning the values in a column into a row). This chapter gives a brief introduction to regular expressions, a coding syntax used in spreadsheets that also appears in programming languages like Python, R and Ruby. Moreover, we will troubleshoot spreadsheet problems – what to do when your spreadsheet treats a number as a character string, for instance. The examples in this chapter and the exercises at the end use data downloaded from the Federal Election Commission and other sources.