Kubeflow on Azure Training Course

Overview

Kubeflow is a framework for running Machine Learning workloads on Kubernetes. TensorFlow is one of the most popular machine learning libraries. Kubernetes is an orchestration platform for managing containerized applications.

This instructor-led, live training (online or onsite) is aimed at engineers who wish to deploy Machine Learning workloads to Azure cloud.

By the end of this training, participants will be able to:

  • Install and configure Kubernetes, Kubeflow and other needed software on Azure.
  • Use Azure Kubernetes Service (AKS) to simplify the work of initializing a Kubernetes cluster on Azure.
  • Create and deploy a Kubernetes pipeline for automating and managing ML models in production.
  • Train and deploy TensorFlow ML models across multiple GPUs and machines running in parallel.
  • Leverage other AWS managed services to extend an ML application.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.

Requirements

  • An understanding of machine learning concepts.
  • Knowledge of cloud computing concepts.
  • A general understanding of containers (Docker) and orchestration (Kubernetes).
  • Some Python programming experience is helpful.
  • Experience working with a command line.

Audience

  • Data science engineers.
  • DevOps engineers interesting in machine learning model deployment.
  • Infrastructure engineers interested in machine learning model deployment.
  • Software engineers wishing to automate the integration and deployment of machine learning features with their application.

Course Outline

Introduction

  • Kubeflow on Azure vs on-premise vs on other public cloud providers

Overview of Kubeflow Features and Architecture

Overview of the Deployment Process

Activating an Azure Account

Preparing and Launching GPU-enabled Virtual Machines

Setting up User Roles and Permissions

Preparing the Build Environment

Selecting a TensorFlow Model and Dataset

Packaging Code and Frameworks into a Docker Image

Setting up a Kubernetes Cluster Using AKS

Staging the Training and Validation Data

Configuring Kubeflow Pipelines

Launching a Training Job.

Visualizing the Training Job in Runtime

Cleaning up After the Job Completes

Troubleshooting

Summary and Conclusion

Leave a Reply

Your email address will not be published. Required fields are marked *