Apache Flume Training Course

Overview

Apache Flume is a distributed service for collecting, aggregating, and moving event log data from multiple sources into a centralized data store.

In this instructor-led, live training, participants will have an in-depth understanding of the fundamentals of Apache Flume.

By the end of this training, participants will be able to:

Enhance their knowledge of Apache Flume features
Understand the architecture and data flow in Apache Flume
Apply their learnings to real world use cases and scenarios
Use Apache Flume for collecting, combining, and transferring large amounts of log data to a centralized data store

Audience

Developers
Engineers

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Requirements

Programming experience

Course Outline

Introduction

Understanding the Fundamentals of Apache Flume

About Apache Flume
Understanding How Flume Works
Overview of the Important Components of Apache Flume
Architecture of Apache Flume
Data Flow Mode
Reliability
Recoverability

Setting Up Apache Flume

Setting up and Configuring an Agent
Starting an Agent
Using Environment Variables
Logging Raw Stream of Data
Installing Third-Party Plugins

Ingesting Data from External Resources

Using Avro RPC Mechanism
Executing Commands
Exploring Network Streams

Setting Multi-Agent Flow

Consolidating Events into a Single Channel

Defining a Flow Multiplexer

Flow Configuration

Defining the Flow
Setting Up Individual Components
Adding Multiple Flows in an Agent
Setting Up a Multi-Tier Flow
Fanning Out the Flow from a Single Source to Multiple Channels

Implementing a Flume Source

Using Avro Source
Using Thrift Source
Using Exec Source
Using JMS Source
Using Spooling Directory Source
Using Taildir Source
Using Twitter 1% firehose Source
Using Kafka Source
Using NetCat TCP Source
Using NetCat UDP Source
Using Sequence Generator Source
Using Syslog TCP Source
Using Multiport Syslog TCP Source
Using Syslog UDP Source
Using HTTP Source
Using Stress Source
Using Legacy Sources
Using Custom Source
Using Scribe Source

Implementing a Flume Sink

Using HDFS Sink
Using Hive Sink
Using Logger Sink
Using Avro Sink
Using Thrift Sink
Using IRC Sink
Using File Roll Sink
Using Null Sink
Using HBaseSinks
Using MorphlineSolrSink
Using ElasticSearchSink
Using Kite Dataset Sink
Using Kafka Sink
Using HTTP Sink
Using Custom Sink

Implementing a Flume Channel Interface

Using Memory Channel
Using JDBC Channel
Using Kafka Channel
Using File Channel
Using Spillable Memory Channel
Using Pseudo Transaction Channel
Using a Custom Channel

Using Flume Channel Selectors

Using the Replicating Channel Selector
Using the Multiplexing Channel Selector
Using a Custom Channel Selector

Implementing Flume Sink Processors

Using the Defauult Sink Processor
Using the Failover Sink Processor
Using the Load balancing Sink Processor
Using a Custom Sink Processor

Using Event Serializers

Using Flume Interceptors

Using the Timestamp Interceptor
Using the Host Interceptor
Using the Static Interceptor
Using the Remove Header Interceptor
Using the UUID Interceptor
Using the Morphline Interceptor
Using the Search and Replace Interceptor
Using the Regex Filtering Interceptor
Using the Regex Extractor Interceptor

Understanding Flume Properties

Security Configurations on Apache Flume

Monitoring and Reporting in Apache Flume

Using Tools in Apache Flume

Using the File Channel Integrity Tool
Using the Event Validator Tool

Understanding Topology Design Considerations

Handling Agent Failures

Handling Compatibility

Troubleshooting

Summary and Conclusion

Overview

Requirements

Course Outline

Leave a Reply Cancel reply