ABSTRACT

A Data Warehouse (DW) (Chaudhuri and Dayal 1996, 1997; Inmon 1993; Inmon and Hackathorn 1994; Inmon et al. 1997; Kimball et al. 1998; Widom 1995) is a repository of subject-oriented, integrated, and non-volatile information, aimed at supporting knowledge workers (executives, managers, analysts) to make better and faster decisions. DWs contain large amounts of information which is collected from a variety of independent sources and is often maintained separately from the operational databases. Traditionally, operational databases are optimized for Online Transaction Processing (OLTP), where consistency and recoverability are critical. Transactions are typically small and access a small number of individual records based on the primary key. Operational databases maintain current state information. In contrast, DWs maintain historical, summarized and consolidated information, and are designed for Online Analytical Processing (OLAP) (Codd 1995; Codd et al. 1993). The data in the warehouse are often modelled as a multidimensional space to facilitate the query engines for OLAP, where queries typically aggregate data across many dimensions in order to detect trends and anomalies (Mumick et al. 1997). There is a set of numeric measures that are the subjects of analysis in a multidimensional data model. Each of the numeric measures is determined by a set of dimensions. In a census data warehouse, for example, the measure is population, and the dimensions of interest are age group, ethnicity, income type, time (year), and location (census tract). Given N dimensions, the measures can be aggregated in 2 N different ways. The Standard Query Language (SQL) aggregate functions and the group-by operator only produce one out of 2 N aggregates at a time. A data cube (Gray et al. 1996) is an aggregate operator which computes all 2 N aggregates in one shot.