ABSTRACT

Big data is defined as volumes of data available in varying degrees of complexity, generated at different velocities and varying degrees of ambiguity that cannot be processed using traditional technologies, processing methods, algorithms, or any commercial off-the-shelf solutions. This chapter explores the challenges of big data computing: managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms that can scale to search and process massive amounts of data. The answer to these challenges is a scalable, integrated computer systems hardware and software architecture designed for parallel processing of big data computing applications. Cloud computing provides the opportunity for organizations with limited internal resources to implement large-scale big data computing applications in a cost-effective manner. This chapter describes the characteristic features of big data systems including big data architecture, row versus column-oriented data layouts, NoSQL data management, in-memory computing, and developing big data applications.