ABSTRACT

The Definition pipe focuses exclusively on the Definition table in each dimension. The Reference pipe has already taken care of all the other table constructs in the star design pattern. What remains is the complex logic of managing change in the definition entries themselves, that complexity taking two forms in the Definition pipe. The high-level sequence diagram in Figure 11.1 illustrates the main steps in the processing. Specifically, the processing covered in the Definition pipe includes

◾ Transforming source-only intake data into source-to-target staged data through a join to the metadata dimension, including codeset translation of nonstandard source values

◾ Pivoting Definition data columns by Master ID to obtain wide rows for insertion or update in each dimension

◾ Marrying the generic wide rows to their respective dimension identifiers from the Reference pipe

◾ Applying data to each dimension Definition table as new inserts (including orphans), auto-adoption of existing orphans, and data updates (including slowly-changing dimension [SCD] controls)

Definition processing in the dimensions is more complex than the Reference processing described in Chapter 10. Definition handling is more complex because each definition table can have an arbitrary number of dimension-specific columns defined. These columns require your ETL subsystem code to be specialized in ways that none of the other five dimension tables required. Definition processing is also more complex because a single execution of the ETL subsystem might need to process multiple transactions for the same entry in the dimension. An entry added to the source master might have been updated several times before the source data intake extract obtains the source extract. Applying all of those changes to the

entries in the Definition table, in the correct order, makes the Definition pipe much more complex than if only a single transaction per execution were possible.