Overview
Apache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data.
This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the deployment and usage of Zeppelin in a single-user or multi-user environment.
By the end of this training, participants will be able to:
- Install and configure Zeppelin
- Develop, organize, execute and share data in a browser-based interface
- Visualize results without referring to the command line or cluster details
- Execute and collaborate on long workflows
- Work with any of a number of plug-in language/data-processing-backends, such as Scala (with Apache Spark), Python (with Apache Spark), Spark SQL, JDBC, Markdown and Shell.
- Integrate Zeppelin with Spark, Flink and Map Reduce
- Secure multi-user instances of Zeppelin with Apache Shiro
Audience
- Data engineers
- Data analysts
- Data scientists
- Software developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Requirements
- An understanding of big data concepts
- Experience with Spark and Hadoop
- Experience with the command line
Course Outline
Introduction
Installing and Configuring Zeppelin
Overview of Zeppelin Features and Architecture
Navigating the Browser Interface
Understanding the Data Analysis Workflow
Organizing Data for Analysis
Visualizing Data
Sharing Data and Collaborating with Other Analysists
Working with Plug-ins
Backend Data Processing
Working with Scala and Apache Spark
Working with Python and Apache Spark
Working with Spark SQL
Working with JDBC
Using Markdown and Shell
Integrating Zeppelin with Spark, Flink and Map Reduce
Setting up and Securing Multi-user Instances
Troubleshooting
Summary and Conclusion