Overview
Sqoop is an open source software tool for transfering data between Hadoop and relational databases or mainframes. It can be used to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS). Thereafter, the data can be transformed in Hadoop MapReduce, and then re-exported back into an RDBMS.
In this instructor-led, live training, participants will learn how to use Sqoop to import data from a traditional relational database to Hadoop storage such HDFS or Hive and vice versa.
By the end of this training, participants will be able to:
- Install and configure Sqoop
- Import data from MySQL to HDFS and Hive
- Import data from HDFS and Hive to MySQL
Audience
- System administrators
- Data engineers
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Requirements
- An understanding of big data concepts (HDFS, Hive, etc.)
- An understanding of relational databases (MySQL, etc.)
- Experience with the Linux command line
Course Outline
Introduction
- Moving data from legacy data stores to Hadoop
Installing and Configuring Sqoop
Overview of Sqoop Features and Architecture
Importing Data from MySQL to HDFS
Importing Data from MySQL to Hive
Transforming Data in Hadoop
Importing Data from HDFS to MySQL
Importing Data from Hive to MySQL
Importing Incrementally with Sqoop Jobs
Troubleshooting
Summary and Conclusion