ABSTRACT

Parallel processors can efficiently search very large databases or high-transaction-volume applications that include gigabytes of information. There are two primary approaches to parallel processor independence: a shared-nothing environment and function shipping. In addition, some database management system provide a rebalance utility for redistributing data over existing nodes to maximize throughput when processing queries. Businesses find they can make more accurate predictions regarding shipments, volumes, and identification of sales territories if they mine large databases rather than randomly scanning samples. Performance improvements are only negligible for queries executed in extremely short times on a serial database. Data reorganization is necessary when disk space cannot be used effectively. Query execution is analogous to data flowing on trees of operators divided by tasks, with sends and receives being used for intertask communication. The query optimizer determines the cost of a plan by choosing between system resources and response time.