Overview
RAPIDS is a suite of open source software libraries built to accelerate GPU-driven data science and analytics pipelines. It is based on Python and includes a DataFrame API that integrates with a variety of machine learning algorithms.
This instructor-led, live training (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.
By the end of this training, participants will be able to:
- Set up the necessary development environment to build data models with NVIDIA RAPIDS.
- Understand the features, components, and advantages of RAPIDS.
- Leverage GPUs to accelerate end-to-end data and analytics pipelines.
- Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
- Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
- Build data visualizations and execute graph analysis with cuXfilter and cuGraph.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Requirements
- Familiarity with CUDA
- Python programming experience
Audience
- Data scientists
- Developers
Course Outline
Introduction
- Overview of RAPIDS features and components
- GPU computing concepts
Getting Started
- Installing RAPIDS
- cuDF, cUML, and Dask
- Primitives, algorithms, and APIs
Managing and Training Data
- Data preparation and ETL
- Creating a training set using XGBoost
- Testing the training model
- Working with CuPy array
- Using Apache Arrow data frames
Visualizing and Deploying Models
- Graph analysis with cuGraph
- Implementing Multi-GPU with Dask
- Creating an interactive dashboard with cuXfilter
- Inference and prediction examples
Troubleshooting
Summary and Next Steps