ABSTRACT

Boxplots encode the five number summary of a numeric variable, and provide a decent way to compare many numeric distributions. The visual task of comparing multiple boxplots is relatively easy (i.e., compare position along a common scale) compared to some common alternatives (e.g., a trellis display of histograms, like Figure 5.1), but the boxplot is sometimes inadequate for capturing complex (e.g., multi-modal) distributions (in this case, a frequency polygon, like Figure 2.9 provides a nice alternative). The add_boxplot() function requires one numeric variable, and guarantees boxplots are oriented 1 correctly, regardless of whether the numeric variable is placed on the x or y scale. As Figure 6.1 shows, on the axis orthogonal to the numeric axis, you can provide a discrete variable (for conditioning) or supply a single value (to name the axis category). Overall diamond price and price by cut. https://s3-euw1-ap-pe-df-pch-content-public-p.s3.eu-west-1.amazonaws.com/9780429447273/5aa76a70-2d8a-4f48-af74-4a299517eb99/content/fig6_1.tif" xmlns:xlink="https://www.w3.org/1999/xlink"/> p <- plot_ly(diamonds, y = ∼price, color = I("black"),      alpha = 0.1, boxpoints = "suspectedoutliers") p1 <- p %>% add_boxplot(x = "Overall") p2 <- p %>% add_boxplot(x = ∼cut) subplot(  p1, p2, shareY = TRUE,  widths = c(0.2, 0.8), margin = 0 ) %>% hide_legend()