ABSTRACT

This chapter outlines deliberated software frameworks based on Hadoop in a cloud-based ecosystem to support small- and medium-sized businesses to get access to Big Data Analytics. It provides some reference models and architectures to operate a Hadoop Ecosystem. The integration and collection of data within a Hadoop ecosystem could be done over several opportunities. The archiving or the classification of data could be done in the Hadoop Ecosystem in several manually or automatically processes. Hive uses on the one hand Apache Hadoop Distributed File System (HDFS) to store data and on the other hand Map-Reduce to translate queries to operate them within the Hadoop Ecosystem. The HDFS, which is inspired by the Google File System, is a distributed storage used by Hadoop applications. There are many external software tools that could access the Hadoop Ecosystem over standardized interfaces such as Java Database Connectivity or Open Database Connectivity. The most noted tools are Microsoft Excel or Tableau Desktop from Tableau Software.