Accelerating Python Pandas Workflows with Modin Training Course

Overview

Modin is a parallel data frame system designed to speed up Pandas workflows. It can be used to handle large datasets, leveraging Ray or Dask as the backend framework for distributed computing in Python.

This instructor-led, live training (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.

By the end of this training, participants will be able to:

  • Set up the necessary environment to start developing Pandas workflows at scale with Modin.
  • Understand the features, architecture, and advantages of Modin.
  • Know the differences between Modin, Dask, and Ray.
  • Perform Pandas operations faster with Modin.
  • Implement the entire Pandas API and functions.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Requirements

  • Familiarity with Pandas
  • Python programming experience

Audience

  • Data scientists
  • Developers

Course Outline

Introduction

  • Modin vs Dask vs Ray
  • Overview of Modin features and architecture
  • Pandas fundamentals

Getting Started

  • Installing Modin
  • Importing Pandas from Modin
  • Defaulting to Pandas in Modin
  • Supported APIs

Managing Pandas workflows using Modin

  • Using Modin on a single node
  • Using Modin on a cluster
  • Connecting to a database (read_sql)
  • Optimizing resources for Modin

Interacting with Datasets

  • Reading data, dropping columns, and finding values
  • Executing advanced Pandas operations
  • Common issues and examples

Troubleshooting

Summary and Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *