ABSTRACT

The world of structured, semi-structured, and un-structured data is comprised of two Data Management Use Cases. In contrast, the default expectation for a Data Lake is to acquire all of the data and retain all of the data in their own formats and at the time of data analysis use data visualization tools like Tableau and data virtualization tools like Denodo. The emphasis should be on ensuring data is made fit-for-purpose and protected in an automated, flexible, and scalable repository such as HIVE or HBase. Transformations are much better suited for a batch processing system like Hadoop which offers the agility to work with any data type, and are very scalable. Cloudera HIVE is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analytics. Data management tools such as Informatica Big Data Management provide the fast and efficient means to automate Big Data feeds into data warehouses such as HIVE or HBase.