Spark Streaming with Python and Kafka Training Course

Overview

Apache Spark Streaming is a scalable, open source stream processing system that allows users to process real-time data from supported sources. Spark Streaming enables fault-tolerant processing of data streams.

This instructor-led, live training (online or onsite) is aimed at data engineers, data scientists, and programmers who wish to use Spark Streaming features in processing and analyzing real-time data.

By the end of this training, participants will be able to use Spark Streaming to process live data streams for use in databases, filesystems, and live dashboards.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Requirements

Experience with Python and Apache Kafka
Familiarity with stream-processing platforms

Audience

Data engineers
Data scientists
Programmers

Course Outline

Introduction

Overview of Spark Streaming Features and Architecture

Supported data sources
Core APIs

Preparing the Environment

Dependencies
Spark and streaming context
Connecting to Kafka

Processing Messages

Parsing inbound messages as JSON
ETL processes
Starting the streaming context

Performing a Windowed Stream Processing

Slide interval
Checkpoint delivery configuration
Launching the environment

Prototyping the Processing Code

Connecting to a Kafka topic
Retrieving JSON from data source using Paw
Variations and additional processing

Streaming the Code

Job control variables
Defining values to match
Functions and conditions

Acquiring Stream Output

Counters
Kafka output (matched and non-matched)

Troubleshooting

Summary and Conclusion

Posts

Overview

Requirements

Course Outline

Leave a Reply Cancel reply