Apache Flume Training Course

Overview

Apache Flume is a distributed service for collecting, aggregating, and moving event log data from multiple sources into a centralized data store.

In this instructor-led, live training, participants will have an in-depth understanding of the fundamentals of Apache Flume.

By the end of this training, participants will be able to:

  • Enhance their knowledge of Apache Flume features
  • Understand the architecture and data flow in Apache Flume
  • Apply their learnings to real world use cases and scenarios
  • Use Apache Flume for collecting, combining, and transferring large amounts of log data to a centralized data store

Audience

  • Developers
  • Engineers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Requirements

  • Programming experience

Course Outline

Introduction

Understanding the Fundamentals of Apache Flume

  • About Apache Flume
  • Understanding How Flume Works
  • Overview of the Important Components of Apache Flume
  • Architecture of Apache Flume
  • Data Flow Mode
  • Reliability
  • Recoverability

Setting Up Apache Flume

  • Setting up and Configuring an Agent
  • Starting an Agent
  • Using Environment Variables
  • Logging Raw Stream of Data
  • Installing Third-Party Plugins

Ingesting Data from External Resources

  • Using Avro RPC Mechanism
  • Executing Commands
  • Exploring Network Streams

Setting Multi-Agent Flow

Consolidating Events into a Single Channel

Defining a Flow Multiplexer

Flow Configuration

  • Defining the Flow
  • Setting Up Individual Components
  • Adding Multiple Flows in an Agent
  • Setting Up a Multi-Tier Flow
  • Fanning Out the Flow from a Single Source to Multiple Channels

Implementing a Flume Source

  • Using Avro Source
  • Using Thrift Source
  • Using Exec Source
  • Using JMS Source
  • Using Spooling Directory Source
  • Using Taildir Source
  • Using Twitter 1% firehose Source
  • Using Kafka Source
  • Using NetCat TCP Source
  • Using NetCat UDP Source
  • Using Sequence Generator Source
  • Using Syslog TCP Source
  • Using Multiport Syslog TCP Source
  • Using Syslog UDP Source
  • Using HTTP Source
  • Using Stress Source
  • Using Legacy Sources
  • Using Custom Source
  • Using Scribe Source

Implementing a Flume Sink

  • Using HDFS Sink
  • Using Hive Sink
  • Using Logger Sink
  • Using Avro Sink
  • Using Thrift Sink
  • Using IRC Sink
  • Using File Roll Sink
  • Using Null Sink
  • Using HBaseSinks
  • Using MorphlineSolrSink
  • Using ElasticSearchSink
  • Using Kite Dataset Sink
  • Using Kafka Sink
  • Using HTTP Sink
  • Using Custom Sink

Implementing a Flume Channel Interface

  • Using Memory Channel
  • Using JDBC Channel
  • Using Kafka Channel
  • Using File Channel
  • Using Spillable Memory Channel
  • Using Pseudo Transaction Channel
  • Using a Custom Channel

Using Flume Channel Selectors

  • Using the Replicating Channel Selector
  • Using the Multiplexing Channel Selector
  • Using a Custom Channel Selector

Implementing Flume Sink Processors

  • Using the Defauult Sink Processor
  • Using the Failover Sink Processor
  • Using the Load balancing Sink Processor
  • Using a Custom Sink Processor

Using Event Serializers

Using Flume Interceptors

  • Using the Timestamp Interceptor
  • Using the Host Interceptor
  • Using the Static Interceptor
  • Using the Remove Header Interceptor
  • Using the UUID Interceptor
  • Using the Morphline Interceptor
  • Using the Search and Replace Interceptor
  • Using the Regex Filtering Interceptor
  • Using the Regex Extractor Interceptor

Understanding Flume Properties

Security Configurations on Apache Flume

Monitoring and Reporting in Apache Flume

Using Tools in Apache Flume

  • Using the File Channel Integrity Tool
  • Using the Event Validator Tool

Understanding Topology Design Considerations

Handling Agent Failures

Handling Compatibility

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *