ABSTRACT

Introduction ......................................................................................................... 104 The Story as it is Told from the Business Perspective .................................... 104 The Story as it is Told from the Technology Perspective ............................... 107

Data Challenges .............................................................................................. 107 Volume ........................................................................................................ 107 Variety, Combining Multiple Data Sets .................................................. 108 Velocity ........................................................................................................ 108 Veracity, Data Quality, Data Availability ................................................ 109 Data Discovery ........................................................................................... 109 Quality and Relevance .............................................................................. 109 Data Comprehensiveness ......................................................................... 109 Personally Identifiable Information ........................................................ 109 Data Dogmatism ........................................................................................ 110 Scalability .................................................................................................... 110

Process Challenges ......................................................................................... 110 Management Challenges ............................................................................... 110 Big Data Platforms Technology: Current State of the Art ........................ 111

Take the Analysis to the Data! ................................................................. 111 What Is Apache Hadoop? ......................................................................... 111 Who Are the Hadoop Users? ................................................................... 112 An Example of an Advanced User: Amazon ......................................... 113 Big Data in Data Warehouse or in Hadoop? .......................................... 113 Big Data in the Database World (Early 1980s Till Now) ...................... 113 Big Data in the Systems World (Late 1990s Till Now) .......................... 113 Enterprise Search ....................................................................................... 115 Big Data “Dichotomy” .............................................................................. 115 Hadoop and the Cloud ............................................................................. 116 Hadoop Pros ............................................................................................... 116 Hadoop Cons ............................................................................................. 116

Technological Solutions for Big Data Analytics ......................................... 118 Scalability and Performance at eBay ...................................................... 122 Unstructured Data ..................................................................................... 123 Cloud Computing and Open Source ...................................................... 123

Every day, 2.5 quintillion bytes of data are created. These data come from digital pictures, videos, posts to social media sites, intelligent sensors, purchase transaction records, cell phone GPS signals, to name a few. This is known as Big Data.