Zeppelin for Interactive Data Analytics Training Course

Overview

Apache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data.

This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the deployment and usage of Zeppelin in a single-user or multi-user environment.

By the end of this training, participants will be able to:

  • Install and configure Zeppelin
  • Develop, organize, execute and share data in a browser-based interface
  • Visualize results without referring to the command line or cluster details
  • Execute and collaborate on long workflows
  • Work with any of a number of plug-in language/data-processing-backends, such as Scala (with Apache Spark), Python (with Apache Spark), Spark SQL, JDBC, Markdown and Shell.
  • Integrate Zeppelin with Spark, Flink and Map Reduce
  • Secure multi-user instances of Zeppelin with Apache Shiro

Audience

  • Data engineers
  • Data analysts
  • Data scientists
  • Software developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Requirements

  • An understanding of big data concepts
  • Experience with Spark and Hadoop
  • Experience with the command line

Course Outline

Introduction

Installing and Configuring Zeppelin

Overview of Zeppelin Features and Architecture

Navigating the Browser Interface

Understanding the Data Analysis Workflow

Organizing Data for Analysis

Visualizing Data

Sharing Data and Collaborating with Other Analysists

Working with Plug-ins

Backend Data Processing

Working with Scala and Apache Spark

Working with Python and Apache Spark

Working with Spark SQL

Working with JDBC

Using Markdown and Shell

Integrating Zeppelin with Spark, Flink and Map Reduce

Setting up and Securing Multi-user Instances

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *