
Overview
Apache Flume is a distributed service for collecting, aggregating, and moving event log data from multiple sources into a centralized data store.
In this instructor-led, live training, participants will have an in-depth understanding of the fundamentals of Apache Flume.
By the end of this training, participants will be able to:
- Enhance their knowledge of Apache Flume features
- Understand the architecture and data flow in Apache Flume
- Apply their learnings to real world use cases and scenarios
- Use Apache Flume for collecting, combining, and transferring large amounts of log data to a centralized data store
Audience
- Developers
- Engineers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Requirements
- Programming experience
Course Outline
Introduction
Understanding the Fundamentals of Apache Flume
- About Apache Flume
- Understanding How Flume Works
- Overview of the Important Components of Apache Flume
- Architecture of Apache Flume
- Data Flow Mode
- Reliability
- Recoverability
Setting Up Apache Flume
- Setting up and Configuring an Agent
- Starting an Agent
- Using Environment Variables
- Logging Raw Stream of Data
- Installing Third-Party Plugins
Ingesting Data from External Resources
- Using Avro RPC Mechanism
- Executing Commands
- Exploring Network Streams
Setting Multi-Agent Flow
Consolidating Events into a Single Channel
Defining a Flow Multiplexer
Flow Configuration
- Defining the Flow
- Setting Up Individual Components
- Adding Multiple Flows in an Agent
- Setting Up a Multi-Tier Flow
- Fanning Out the Flow from a Single Source to Multiple Channels
Implementing a Flume Source
- Using Avro Source
- Using Thrift Source
- Using Exec Source
- Using JMS Source
- Using Spooling Directory Source
- Using Taildir Source
- Using Twitter 1% firehose Source
- Using Kafka Source
- Using NetCat TCP Source
- Using NetCat UDP Source
- Using Sequence Generator Source
- Using Syslog TCP Source
- Using Multiport Syslog TCP Source
- Using Syslog UDP Source
- Using HTTP Source
- Using Stress Source
- Using Legacy Sources
- Using Custom Source
- Using Scribe Source
Implementing a Flume Sink
- Using HDFS Sink
- Using Hive Sink
- Using Logger Sink
- Using Avro Sink
- Using Thrift Sink
- Using IRC Sink
- Using File Roll Sink
- Using Null Sink
- Using HBaseSinks
- Using MorphlineSolrSink
- Using ElasticSearchSink
- Using Kite Dataset Sink
- Using Kafka Sink
- Using HTTP Sink
- Using Custom Sink
Implementing a Flume Channel Interface
- Using Memory Channel
- Using JDBC Channel
- Using Kafka Channel
- Using File Channel
- Using Spillable Memory Channel
- Using Pseudo Transaction Channel
- Using a Custom Channel
Using Flume Channel Selectors
- Using the Replicating Channel Selector
- Using the Multiplexing Channel Selector
- Using a Custom Channel Selector
Implementing Flume Sink Processors
- Using the Defauult Sink Processor
- Using the Failover Sink Processor
- Using the Load balancing Sink Processor
- Using a Custom Sink Processor
Using Event Serializers
Using Flume Interceptors
- Using the Timestamp Interceptor
- Using the Host Interceptor
- Using the Static Interceptor
- Using the Remove Header Interceptor
- Using the UUID Interceptor
- Using the Morphline Interceptor
- Using the Search and Replace Interceptor
- Using the Regex Filtering Interceptor
- Using the Regex Extractor Interceptor
Understanding Flume Properties
Security Configurations on Apache Flume
Monitoring and Reporting in Apache Flume
Using Tools in Apache Flume
- Using the File Channel Integrity Tool
- Using the Event Validator Tool
Understanding Topology Design Considerations
Handling Agent Failures
Handling Compatibility
Troubleshooting
Summary and Conclusion

Leave a Reply