Overview
Apache Sqoop is a command line interface for moving data from relational databases and Hadoop. Apache Flume is a distributed software for managing big data. Using Sqoop and Flume, users can transfer data between systems and import big data into storage architectures such as Hadoop.
This instructor-led, live training (online or onsite) is aimed at software engineers who wish to use Sqoop and Flume for transferring data between systems.
By the end of this training, participants will be able to:
- Ingest big data with Sqoop and Flume.
- Ingest data from multiple data sources.
- Move data from relational databases to HDFS and Hive.
- Export data from HDFS to a relational database.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Requirements
- Experience with SQL
Audience
- Software Engineers
Course Outline
Introduction
Sqoop and Flume Overview
- What is Sqoop?
- What is Flume?
- Sqoop and Flume features
Preparing the Development Environment
- Installing and configuring Apache Sqoop
- Installing and configuring Apache Flume
Apache Flume
- Creating an agent
- Using spool sources, file channels, and logger sinks
- Working with events
- Accessing data sources
Apache Sqoop
- Importing MySQL to HDFS and Hive
- Using Sqoop jobs
Data Ingestion Pipelines
- Building pipelines
- Fetching data
- Ingesting data to HDFS
Summary and Conclusion