ABSTRACT

A data scientist have to import data into R from either a file, a database, or other sources. Currently, one of the most common ways of storing and sharing data for analysis is through electronic spreadsheets. It is basically a file version of a data frame. When creating spreadsheets with text files, like the ones created with a simple text editor, a new row is defined with return and columns are separated with some predefined special character. This chapter describes the difference between text, Unicode, and binary files and how this affects how we import them. It explains the concepts of file paths and working directories, which are essential to understand how to import data effectively. The chapter introduces the readr and readxl package and the functions that are available to import spreadsheets into R. It provides some recommendations on how to store and organize data in files. The chapter also introduces the main tidyverse data importing functions.