ABSTRACT

In traditional database integration and storage efforts, there has been a clear demarcation between data storage and data processing. With the advancements

in efficiency and performance technologies, this line is beginning to blur. In this chapter, we discuss the elements of big data infrastructure from the perspective of existing approaches, their difficulties in adapting to the needs of high-volume and varied data types, and the benefits of distributed approaches that are more cost-effective. The open-source big data technologies-Hadoop, the Hadoop Distributed File System, and MapReduce-are described, as well as other database technologies that are beneficial within the utility ecosystem. We’ll also address the fundamentals of different database concepts, their defining characteristics, and the best use of each.