ABSTRACT

The most basic statistical summary of a list of objects or numbers is its distribution. Once a vector has been summarized as a distribution, there are several data visualization techniques to effectively relay this information. This chapter discusses the properties of a variety of distributions and how to visualize distributions using a motivating example of student heights. It also discusses the ggplot2 geometries for these visualizations. The chapter focuses on two types of variables: categorical and numeric. Each can be divided into two other groups: categorical can be ordinal or not, whereas numerical variables can be discrete or continuous. The chapter presents a case study on describing student heights. In general, when data is not categorical, reporting the frequency of each entry is not an effective summary since most entries are unique. The chapter then discusses how to code histograms. Smooth density plots are aesthetically more appealing than histograms.