ABSTRACT

Big data deals with large scales of data characterized by three concepts: volume, variety, and velocity known as the 3Vs of Big Data. Volume is a term related to Big Data, and as known data can be organized in sizes by gigabytes or terabytes of data storage but Big Data means there are a lot of data amounting to more than terabytes such as petabytes or exabytes and it is one of the challenges of Big Data that it requires a scalable storage. Really, data volume will continue to grow every day, regardless of the organized sizes because of the natural tendency of companies to store all types of data such as nancial data, medical data, environmental data, and so on. Many of these companies’ datasets are within the terabytes range today, but soon they could reach petabytes or even

CONTENTS Big Data Challenges 58 Big Data Management Systems 60

Distributed File System 61 Nonstructural and Semistructured Data Storage 61

Big Data Analytics 62 Data Mining 62 Image and Speech Data Recognition 62 Social Network Data Analysis 63 Data Fusion and Integration 64

Management of Big Data Distributed Systems 64 Hadoop Technologies 65

Hadoop Distributed File System (HDFS) 65 Hadoop MapReduce 66

NoSQL Database Management System (NoSQL DBMS) 66 Soware as a Service (SaaS)–Based Business Analytics 68 Master Data Management 68

Conclusion 69 References 69

exabyte and more. Variety of Big Data is an aggregation of many types of data and maybe structured or unstructured including social media, multimedia, web server logs, and many other types of information forms. With the explosion of sensors, smart devices as well as social networking, data in an enterprise has become complex because it includes not only structured traditional relational data, but also semistructured and unstructured data and velocity becomes more reality in these types of data [1].