Apache Spark SQL Training Course

Overview

Spark SQL is Apache Spark’s module for working with structured and unstructured data. Spark SQL provides information about the structure of the data as well as the computation being performed. This information can be used to perform optimizations. Two common uses for Spark SQL are:
– to execute SQL queries.
– to read data from an existing Hive installation.

In this instructor-led, live training (onsite or remote), participants will learn how to analyze various types of data sets using Spark SQL.

By the end of this training, participants will be able to:

Install and configure Spark SQL.
Perform data analysis using Spark SQL.
Query data sets in different formats.
Visualize data and query results.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Requirements

Experience with SQL queries
Programming experience in any language

Audience

Data analysts
Data scientists
Data engineers

Course Outline

Introduction

Overview of Data Access Approaches (Hive, databases, etc.)

Overview of Spark Features and Architecture

Installing and Configuring Spark

Understanding Dataframes in Spark

Defining Tables and Importing Datasets

Querying Data Frames using SQL

Carrying out Aggregations, JOINs and Nested Queries

Uploading and Accessing Data

Querying Different Types of Data

JSON, Parquet, etc.

Querying Data Lakes with SQL

Troubleshooting

Summary and Conclusion

Posts

Overview

Requirements

Course Outline

Leave a Reply Cancel reply