ABSTRACT

There is a saying that a picture is worth a thousand words. Visualization reveals many underlying features of data, which statistics and models may miss: patterns, changes over time, unusual observations, clustering, gaps, relationships among variables. This chapter introduces basic concepts and uses of visualization functions in the “ggplot2” R package. It examines the underlying grammar of graphics structure in the “ggplot2” package with examples. The primary focus is to understand how to apply those visualization techniques in “ggplot2” to real epidemiology data with various formats and purposes. After introducing the idea of individual and collective “geoms,” the chapter demonstrates two important collective plots: time series plots and maps. The “ggplot2” package generally prefers data in the “long” format: i.e., a column for every dimension, and a row for every observation. The chapter concludes with a discussion of how to arrange plots and save the output.