ABSTRACT

Apache SQOOP stands out in the Big Data world when we talk about data migration. It was created in 2009 by Aaron Kimball as a means of moving data between SQL databases and Hadoop. It provided a generic implementation for moving data. It also provided a framework for implementing database specific optimized connectors. Apache SQOOP is a powerful data exchange between data stores such as Relational database management system and Enterprise Data Warehouse on the one hand and HIVE and on the other. In Map Reduce processing, SQOOP uses primary key ranges to divide up data between mappers. However, the deletes hit older key values harder, making key ranges unbalanced. SQOOP parses the arguments provided in the command line and prepares the map job. The map job launches multiple mappers that depend on the number defined by user in the command line. SQOOP distributes the input data among the mappers equally to get high performance.