ABSTRACT

A DW is a repository of subject-oriented, integrated, and nonvolatile information, aimed at supporting knowledge workers (executives, managers, and analysts) to make better and faster decisions (Chaudhuri and Dayal 1997; Immon 1993; Immon and Hackathorn 1994; Immon, Welch, and Glassey 1997; Kimball et al. 1998; Widom 1995). Data warehouses contain large amounts of information, which is collected from a variety of independent sources and is often maintained separately from the operational databases. Traditionally, operational databases are optimized for online transaction processing (OLTP), where consistency and recoverability are critical. Transactions are typically small and access a small number of individual records based on the primary key. Operational databases maintain current state information. In contrast, DWs maintain historical, summarized, and consolidated information, and are designed for online analytical processing (OLAP) (Codd 1995; Codd, Codd, and Salley 1993). The data in the warehouse are often modeled as a multidimensional space to facilitate the query engines for OLAP, where queries typically aggregate data across many dimensions in order to detect trends and anomalies (Mumick, Quass, and Mumick 1997). There is a set of numeric measures that are the subjects of analysis in a multidimensional data model. Each of the numeric measures is determined by a set of dimensions. In a trafc DW, for example, a fundamental trafc measure is vehicle ow where dimensions of interest are segment, freeway, day, and week. Given N dimensions, the measures can be aggregated in 2N different ways. The SQL aggregate functions and the group-by operator only produce one out of 2N aggregates at a time. A data cube (Gray et al. 1995) is an aggregate operator which computes all 2N aggregates in one shot.