ABSTRACT

Mechanizing hypothesis formation is an approach to exploratory data analysis. Its development started in the 1960s inspired by the question “can computers formulate and verify scientific hypotheses?”. The development resulted in a general theory of logic of discovery. It comprises theoretical calculi dealing with theoretical statements as well as observational calculi dealing with observational statements concerning finite results of observation. Both calculi are related through statistical hypotheses tests. A GUHA method is a tool of the logic of discovery. It uses a one-to-one relation between theoretical and observational statements to get all interesting theoretical statements. A GUHA procedure generates all interesting observational statements and verifies them in a given observational data. Output of the procedure consists of all observational statements true in the given data. Several GUHA procedures dealing with association rules, couples of association rules, action rules, histograms, couples of histograms, and patterns based on general contingency tables are involved in the LISp-Miner system developed at the Prague University of Economics and Business. Various results about observational calculi were achieved and applied together with the LISp-Miner system.

The book covers a brief overview of logic of discovery. Many examples of applications of the GUHA procedures to solve real problems relevant to data mining and business intelligence are presented. An overview of recent research results relevant to dealing with domain knowledge in data mining and its automation is provided. Firsthand experiences with implementation of the GUHA method in the Python language are presented.

chapter Chapter 1|21 pages

Introduction

chapter Chapter 2|18 pages

Datasets

part I|45 pages

Procedures

chapter Chapter 3|11 pages

Principle and Simple Examples

chapter Chapter 4|17 pages

Common Features

chapter Chapter 5|15 pages

LISp-Miner System

part II|171 pages

Applying the Guha Procedures

chapter Chapter 6|15 pages

Examples Overview

chapter Chapter 7|39 pages

4ft-Miner—GUHA Association Rules

chapter Chapter 8|26 pages

CF-Miner—Histograms

chapter Chapter 11|15 pages

SDCF-Miner—Couples of Histograms

chapter Chapter 13|17 pages

Ac4ft-Miner Action Rules

chapter Chapter 14|11 pages

GUHA Procedures and Business Intelligence

chapter Chapter 15|11 pages

Clever Miner GUHA and Python

part III|63 pages

Research and Theory

chapter Chapter 17|22 pages

Applying Domain Knowledge

chapter Chapter 18|19 pages

Observational Calculi