ABSTRACT

Much of the discussions in the previous chapters assumed that a single database or warehouse has to be mined. Even in the case of multiple data sources we assumed that these data sources had to be integrated into a warehouse so that the warehouse could be mined. In many situations one would want to leave the data in the heterogeneous data sources and then mine these data sources. Furthermore, for many applications the data could be distributed and managed by a distributed database system. For such applications, the data mining tools have to operate on the distributed databases. Finally, there is still a lot of data residing in legacy databases. The major challenge is in mining and extracting useful information from these legacy databases. These legacy databases could also be migrated to new systems and architectures. So is it worth developing mining tools to operate on the legacy databases?