ABSTRACT

This chapter focuses on examining how to transform data efficiently and how to extract and summarize insights from it, which is a vital step for the data science procedure such as visualizing and modeling. The package “dplyr” is an R package created to make tabular data wrangling less difficult by using a limited set of functions that can be used together to extract and summarize insights from the data. Many data analysis tasks are able to be tackled while using the split-apply-combine paradigm. We can divide the data into groups, apply some analysis to each group separately, and then combine the results. R has multiple quick and sophisticated ways to join data frames by a common column. There are at least three ways: Base R's merge() function; Join family of functions from dplyr; and Bracket syntax based on data.table.