Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms Hamparsum Bozdogan University of Tennessee, Knoxville, USA

doi:10.1201/9780203497159-6

Chapter

Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms Hamparsum Bozdogan University of Tennessee, Knoxville, USA

ABSTRACT

CONTENTS 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 What is Information Complexity:ICOMP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Information Criteria for Multiple Regression Models . . . . . . . . . . . . . . . . . . . . . 31 2.4 A GA for the Regression Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

This paper develops a computationally feasible intelligent data mining and knowledge discovery technique that addresses the potentially daunting statistical and combinatorial problems presented by subset regression models. Our approach integrates novel statistical modelling procedures based on an information-theoretic measure of complexity. We form a three-way hybrid between: information measures of complexity, multiple regression models, and genetic algorithms (GAs). We demonstrate our new approach using a simulated example and on a real data set to illustrate the versatility and the utility of the new approach.