Apache Druid for Real-Time Data Analysis Training Course

Overview

Apache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.

In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.

Format of the Course

Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

Requirements

A basic understanding of data infrastructure.
A general knowledge of distributed systems.
Basic Linux command line familiarity.

Audience

Application developers
Software engineers
Technical consultants
DevOps professionals
Architecture engineers

Course Outline

Introduction

Installing and Starting Apache Druid

Druid Architecture and Design

Real-Time Ingestion of Event Data

Sharding and Indexing

Loading Data

Querying Data

Visualizing Data

Running a Distributed Cluster

Druid + Apache Hive

Druid + Apache Kafka

Druid + Others

Troubleshooting

Administrative Tasks

Summary and Conclusion

Posts

Apache Druid for Real-Time Data Analysis Training Course

Overview

Requirements

Course Outline

Leave a Reply Cancel reply