ABSTRACT

As electronic data becomes widely available, the need for tools that help people gain insight from data has arisen. A variety of techniques from statistics, machine learning, and neural networks have been applied to databases in the hopes of mining knowledge from data. Multiple regression is one such method for modeling the relationship between a set of explanatory variables and a dependent variable by fitting a linear equation to observed data. Here, we investigate and discuss some factors that influence whether the resulting regression equation is a credible model of the data.