Overview
Apache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and and other Cloud and file storage systems. The power of Apache Drill lies in its ability to join data from multiple data stores using a single query. Apache Drill supports numerous NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery.
In this instructor-led, live training, participants will learn how to optimize and debug Apache Drill to improve the performance of queries on very large data sets. The course begins with an architectural overview and feature comparison between Apache Drill and other interactive data analysis tools. Participants then step through a series of interactive, hands-on practice sessions that include installation, configuration, performance evaluation, tuning, and debugging of an Apache Drill instance in a live lab environment.
By the end of this training, participants will be able to:
- Install and configure Apache Drill
- Understand Apache Drill’s architecture and features
- Understand the services involved during query execution
- Optimize Drill queries for distributed SQL execution
- Debug Apache Drill
Audience
- Developers
- Systems administrators
- Data analysts
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Notes
- To request a customized training for this course, please contact us to arrange.
Requirements
- A general understanding of Hadoop
- Experience with Linux command line
Course Outline
Introduction to Apache Drill
How does Apache Drill compare to Spark SQL, Hive and Impala?
Overview of Apache Drill Features and Architecture
- Apache Drill Components
Understanding Apache Drill Queries
- Query Execution Process
Performing SQL Queries
- Connecting to the data source
- Querying the data
Using the Drill Web Console
- Query, Profiles, Storage, Metrics, Threads, and Options
Performance Optimization Strategy
- Identifying the source of performance issues
- Analyzing Query Plans and Profiles
Apache Drill Query Optimization
- Optimizing a Query
Limiting the Data that Drill Reads
- Partitioning the data (partition pruning)
Apache Drill Logging and Debugging
- Analyzing Drill Error Messages
- Configuring Log File Options
Troubleshooting Apache Drill
Summary and Conclusion