Impala for Business Intelligence Training Course

Overview

Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.

Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.

Audience

This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.

After this course delegates will be able to

Extract meaningful information from Hadoop clusters with Impala.
Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
Troubleshoot Impala.

Requirements

knowledge of SQL

Course Outline

Introduction to Impala

What is Impala?
How Impala Differs from Relational Databases
Limitations and Future Directions
Using the Impala Shell
The Impala Daemon, Statestore and Catalogue service

Loading Impala

Explore a New Impala Instance
Load CSV Data from Local Files
Point an Impala Table at Existing Data Files

Analyzing Data with Impala

Describe the Impala Table
Basic Syntax and Querying
Data Types
Filtering, Sorting, and Limiting Results
Joining and Grouping Data
Data Loading and Querying Examples
Improving Impala Performance
How Impala works with Hadoop file formats
Hands-On Exercise: Interactive Analysis with Impala

Programming Impala Applications

Overview of the Impala SQL Dialect
Overview of Impala Programming Interfaces

Troubleshooting Impala

Troubleshooting Impala SQL Syntax Issues
Troubleshooting I/O Capacity Problems
Impala Web User Interface for Debugging

Posts

Overview

Requirements

Course Outline

Leave a Reply Cancel reply