Overview
Dremio is an open-source “self-service data platform” that accelerates the querying of different types of data sources. Dremio integrates with relational databases, Apache Hadoop, MongoDB, Amazon S3, ElasticSearch, and other data sources. It supports SQL and provides a web UI for building queries.
In this instructor-led, live training, participants will learn how to install, configure and use Dremio as a unifying layer for data analysis tools and the underlying data repositories.
By the end of this training, participants will be able to:
- Install and configure Dremio
- Execute queries against multiple data sources, regardless of location, size, or structure
- Integrate Dremio with BI and data sources such as Tableau and Elasticsearch
Audience
- Data scientists
- Business analysts
- Data engineers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Notes
- To request a customized training for this course, please contact us to arrange.
Requirements
- An understanding of Hadoop, NoSQL, and other data storage concepts
- Experience with writing SQL queries
- Experience with Linux command line
Course Outline
Introduction
- How Dremio solves the problem of data staging, data warehousing, aggregation, extracts, etc.
Installing and Configuring Dremio
Overview of Dremio Features and Architectures
- Data Acceleration
- Data Reflections (on HDFS, MapR-FS, cloud storage such as S3, local storage, etc.)
Query Execution Life Cycle
- Planning, coordination, execution,
Navigating the Dremio Web UI
Discovering Data
- The unified data catalog
Curating Data
- Creating virtual datasets
Using SQL to Define Transformations
- Joins and data type conversions
- Connecting through ODBC, JDBC and REST
Sharing Data with Team
- Uploading, collaboration, and access rights
Integrating Dremio with BI (Business Intelligence) Tools
- Serving up data for Tableau
Integrating Dremio with an Elasticsearch Cluster
Summary and Conclusion