Overview
Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a web-based user interface to manage dataflows in real time.
In this instructor-led, live training (onsite or remote), participants will learn how to deploy and manage Apache NiFi in a live lab environment.
By the end of this training, participants will be able to:
- Install and configure Apachi NiFi.
- Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes.
- Automate dataflows.
- Enable streaming analytics.
- Apply various approaches for data ingestion.
- Transform Big Data and into business insights.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Requirements
- Experience with Linux command line.
Audience
- System administrators
- Data engineers
- Developers
- DevOps
Course Outline
Introduction to Apache NiFi
- Data at rest vs data in motion
Overview of Big Data and Apache Hadoop
- HDFS and MapReduce architecture
Setting up and Running a NiFi Cluster
- Cluster Integration
- Load Balancing/Redundancy
- Mass Orchestration of NiFi (via Ansible)
NiFi Operations
- Database Aggregating, Splitting and Transforming
- Data Extractions, Logging, etc.
- Integrating with Splunk (optional)
Monitoring and Recovery
- Recovering without Data Loss
- Autonomous Recovery
Optimizing NiFI
- Performance tuning
- Optimizing Nifi Setup
Best practices
Troubleshooting
Summary and Conclusion