ABSTRACT

This chapter introduces some more advanced exploratory data analysis tools to address important practical issues not covered. The typical discrete-valued variable represents the count of some quantity, like the population of a city, the number of claims filed by an insurance policyholder, or the number of times a hospital patient has been readmitted. The concept of statistical significance is closely related to the confidence intervals just discussed, and p-values are the standard numerical measure of statistical significance. The function rnbinom returns a vector of n negative binomial random samples, given n and the distribution parameters. The Poissonness plot displays a derived quantity called the distribution metameter against the range of observed counts and, if the Poisson approximation is reasonable, points should fall approximately on a line. The popularity of the Gaussian distribution as an approximate description of numerical data variables has been noted repeatedly, and in many cases, this approximation is quite reasonable.