Apache Drill Training Course

Overview

Apache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query. Apache Drill supports numerous NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. Apache Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery.

In this instructor-led, live training, participants will learn the fundamentals of Apache Drill, then leverage the power and convenience of SQL to interactively query big data across multiple data sources, without writing code. Participants will also learn how to optimize their Drill queries for distributed SQL execution.

By the end of this training, participants will be able to:

  • Perform “self-service” exploration on structured and semi-structured data on Hadoop
  • Query known as well as unknown data using SQL queries
  • Understand how Apache Drills receives and executes queries
  • Write SQL queries to analyze different types of data, including structured data in Hive, semi-structured data in HBase or MapR-DB tables, and data saved in files such as Parquet and JSON.
  • Use Apache Drill to perform on-the-fly schema discovery, bypassing the need for complex ETL and schema operations
  • Integrate Apache Drill with BI (Business Intelligence) tools such as Tableau, Qlikview, MicroStrategy and Excel

Audience

  • Data analysts
  • Data scientists
  • SQL programmers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Requirements

  • An understanding of Hadoop, NoSQL, and other data storage concepts
  • Experience with writing SQL queries
  • Experience with Linux command line

Course Outline

Introduction to Apache Drill

How does Apache Drill compare to Spark SQL, Hive and Impala?

Overview of Apache Drill Features and Architecture

  • Apache Drill Components

Performing SQL Queries in Apache Drill

Understanding Data Types and Formats

Working with Schemas

Case Study and Exercise: Querying Sales Data for the Year

Performing Queries on JSON Data

Combining Data Types in SQL Queries

Creating and Dropping Tables and Views

Using Nested Data and Window Functions

Performing Data Analysis with Apache Drill

Case Study and Exercise: Analyzing the Results of a Marketing Campaign

Designing a Query Plan in Apache Drill

Optimizing Queries in Apache Drill

Integrating Apache Drill with MS Excel

Using Apache Drill ODBC/JDBC drivers to plug into Tableau, MicroStrategy, Qlikview, etc.

Case Study and Exercise: Visualizing the Data and the Power of a Good Story

Understanding Apache Drill’s Decentralized Security Model

Apache Drill Performance and Debugging

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *