ABSTRACT

CONTENTS 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 What is Information Complexity:ICOMP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Information Criteria for Multiple Regression Models . . . . . . . . . . . . . . . . . . . . . 31 2.4 A GA for the Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

This paper develops a computationally feasible intelligent data mining and knowledge discovery technique that addresses the potentially daunting statistical and combinatorial problems presented by subset regression models. Our approach integrates novel statistical modelling procedures based on an information-theoretic measure of complexity. We form a three-way hybrid between: information measures of complexity, multiple regression models, and genetic algorithms (GAs). We demonstrate our new approach using a simulated example and on a real data set to illustrate the versatility and the utility of the new approach.