ABSTRACT

Design is a series of compromises that people must make to balance functionality, performance, and cost. The challenge of a team designing a system is to define the problem in a scientific way and to have an eye to the future to make sure that they are making the compromises in the right places. By building a schema on read system, people are trading understanding the data at the start of the project to understanding it on consumption. Understanding the types of data that people are manipulating and their usage within the data pipeline can help them architect the best way to store them. Typically, facts are happy being stored in large, partitioned tables, whilst reference data is happier in relational or object-based storage. The key is to have a good definition of the usage of the data and the non-functional requirements so that people can design the system to meet expectations and push back where those expectations are unreasonable.