ABSTRACT

This chapter discusses regression with more than one predictor. A motivating example of modeling house price on lot size and house size in a specific ZIP code in King County is presented. The data is visualized with emphasis on how the visualization relates back to the model. The summary of a linear model is examined, and we show how to computationally obtain and how to interpret each element of the output. Next, two ZIP codes are considered and we examine the price as a function of lot size, house size and ZIP code. The predict function in the context of multiple regression is introduced. Variable selection is also considered in the context of a data set containing the number of words spoken of many people, together with other measurements of those people. We show one way of using p-values for variable selection, emphasizing the exploratory nature of the process. ANOVA is also used in this context to check whether variables needed to be added back in after they have been removed. The AIC is briefly discussed as an alternative method. The chapter ends with a vignette on alternative data formats, which will be useful to readers using this material for their own data sets.