ABSTRACT

Linear models seem rather restrictive, but because the predictors can be transformed and combined in any way, they are actually very flexible. The term linear is often used in everyday speech as almost a synonym for simplicity. This gives the casual observer the impression that linear models can only handle small, simple datasets. Linear is also used to refer to straight lines, but linear models can be curved, by adding quadratic terms for example. Truly nonlinear models are rarely absolutely necessary and most often arise from a theory about the relationships between the variables, rather than an empirical investigation. Models that derive directly from physical theory are relatively uncommon so that usually the linear model can only be regarded as an approximation to a complex reality. The popular scikit-learn machine learning package does regression based on this function. The important difference is that this package does not compute the full range of subsidiary information obtained with statsmodels.