ABSTRACT

Most discovery systems face difficulties when discovering knowledge from large databases. Adapting the discovery systems to cope with large databases is the traditional solution to overcome these limitations. Another interesting solution is to partitioning the representation space, and applying the discovery system in parallel on data from each subspace, then combining the discovered knowledge if necessary. This paper introduces a new methodology for partitioning the representation space. The method selects an irrelevant attribute, using a utility function, to partition the representation space. Since irrelevant attributes are not needed to describe the concepts discovered from the data, the knowledge discovered from all subspaces should be identical. In such cases, discovery can be done only in one subspace. If the representation space is partitioned by a relevant attribute, the knowledge discovered from all subspaces can be combined simply using information about that attribute. The method is analyzed using two learning systems, AQ15c for learning decision rules from examples and C4.5 for learning decision trees from examples.