Overview
Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.
In this instructor-led, live training, participants will learn how to use Pentaho Data Integration’s powerful ETL capabilities and rich GUI to manage an entire big data lifecycle and maximize the value of data within their organization.
By the end of this training, participants will be able to:
- Create, preview, and run basic data transformations containing steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
- Provide results to third-part applications for further processing
Audience
- Data Analyst
- ETL developers
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Requirements
- An understanding of relational databases
- An understanding of data warehousing
- An understanding of ETL (Extract, Transform, Load) concepts
Course Outline
Introduction
Installing and Configuring Pentaho
Overview of Pentaho Features and Architecture
Understanding Pentaho’s In-Memory Caching
Navigating the User Interface
Connecting to a Data Source
Configuring the Pentaho Enterprise Repository
Transforming Data
Viewing the Transformation Results
Resolving Transformation Errors
Processing a Data Stream
Reusing Transformations
Scheduling Transformations
Securing Pentaho
Integrating with Third-party Applications (Hadoop, NoSQL, etc.)
Analytics and Reporting
Pentaho Design Patterns and Best Practices
Troubleshooting
Summary and Conclusion