Handling Materialized Views | 11 | Automated Physical Database Design

ABSTRACT

Similar to indexes, materialized views are redundant data structures that can be used to speed up query processing. A view is a virtual table whose content is derived from base tables by using a subset of Standard Query Language (SQL) (we discuss the language used to deﬁne views in Section 8.1). A view is materialized by deﬁning an appropriate clustered index. Then, the content of the view is persisted and maintained as if it were a regular table, which can be indexed and leveraged to answer queries eﬃciently. Consider again the query from Chapter 2 that returns the names of employees working in departments with budgets larger than $10 million:

Q1 = SELECT Emp.name FROM Emp, Dept WHERE Emp.DId = Dept.DId AND Dept.Budget>10M

Suppose that we create a materialized view MV1 that precomputes the join between employees and departments. This view, which we denote MV1, is deﬁned as follows:

MV1 = SELECT Emp.EId, Dept.DId, Emp.name, Dept.Budget FROM Emp, Dept WHERE Emp.DId = Dept.DId

Using MV1, we can rewrite the original Q1 into an equivalent form as follows:

Q2 = SELECT MV1.name FROM MV1 WHERE MV1.Budget>10M

In general, a view must subsume a query subexpression to be useful in answering the full query (i.e., the view deﬁnition must contain at least all the tuples required in the query subexpression as well as means to ﬁlter out the tuples that are not relevant). MV1 subsumes the whole Q1, since the results of MV1 are a superset of those of Q1. For Q1 and Q2 to be equivalent, we add the ﬁlter condition MV1.Budget > 10M in Q2, generally referred to as a compensating action. Just by syntactically looking at Q1 and Q2, it cannot be determined which alternative would lead to the most eﬃcient execution

Tuning

plan. If only a few departments satisfy the budget constraint, and there are appropriate indexes in the base tables but not in the materialized view, Q1 can be eﬃciently answered by seeking an index on Dept(Budget, DId) to obtain the departments with budgets over 10M and then performing an index-based join leveraging a clustered index on Emp(DId). If, instead, most departments satisfy the predicate on Budget, scanning MV1 and ﬁltering tuples with lowbudget departments might be much more eﬃcient because the join is avoided. If an index on MV1(Budget, Name) is available, then the plan that seeks such index is always better than the alternative that does not rely on MV1.