Intro to Kubeflow: Pipelines Training and Certification Recap – Jul 28, 2022

August 2, 2022

On July 28th we hosted the “Intro to Kubeflow: Pipelines Training and Certification prep course. In this blog post we’ll recap some highlights from the class, plus give a summary of the Q&A. Ok, let’s dig in!

First, thanks for voting for your favorite charity!

Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give course attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the Global Fund for Women. The Global Fund for Women’s mission is to be a global champion for the human rights of women and girls. They use their powerful networks to find, fund, and amplify the courageous work of women who are building social movements and challenging the status quo. We are pleased to be making a donation of $100 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!

What topics were covered in the course?

This initial course aimed to get data scientists and DevOps engineers with little or no experience familiar with the fundamentals of how Kubeflow works.

Kubeflow Fundamentals Review
Pipeline Basics and Concepts
Pipelines Architecture
Pipelines SDK and CLI
Navigating the Pipelines UI
Advanced Pipelines Topics
Getting Pipelines Up and Running
Pipelines Example: Kaggle’s Digit Recognizer
Pipelines Example: Kaggle’s NLP Disaster Tweets

What did I miss?

Here’s a short teaser from the 90 minute training. In this video we show you how to navigate the various Pipeline related views inside the Kubeflow UI after uploading a computer vision pipeline.

Missed the Jul 28 Kubeflow Pipelines training?

If you were unable to join us last week, you can sign up for upcoming Fundamentals, Notebooks, Pipelines and Kale/Katib courses here.

Alternatively, you can sign up for the next session on Sep 8 directly here.

NEW: Advanced Kubeflow, Kubernetes Basics, Notebooks and Pipelines Workshops

We are excited to announce a new series of FREE workshops focused on taking popular Kaggle and Udacity machine learning examples from “Notebook to Pipeline.” Registration is now open for the following workshops:

Aug 3: The Kaggle JPX Tokyo Stock Exchange Prediction Competition
Aug 4: MLOps Meetup
Aug 10: Advanced Kubeflow & MLOps Workshop
Aug 11: Introduction to Kubeflow: Katib Fundamentals
Aug 18: Introduction to Kubeflow: Fundamentals
Aug 24: The Kaggle Facial Keypoints Detection Competition

Arrikto Academy

If you are ready to put what you’ve learned into practice with hands-on labs? Then check out Arrikto Academy! On this site you’ll find a variety of FREE skills-building labs and tutorials including:

Kubeflow Use Cases: Kaggle OpenVaccine, Kaggle Titanic Disaster, Kaggle Blue Book for Bulldozers, Dog Breed Classification, Distributed Training, Kaggle Digit Recognizer Competition
Kubeflow Functionality – Kale, Katib
Enterprise Kubeflow Skills – Kale SDK, Rok Registry

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the course. [Edited for readability and brevity.]

Can we set up recurring Pipeline runs via Cron?

Yes. Use a run trigger as a flag to tell Kubeflow when a recurring run configuration should spawn a new run. The following types of run triggers are available:

Periodic: for an interval-based scheduling of runs
Cron: for specifying cron semantics for scheduling runs

What is interplay between Docker and Kubeflow?

Pipeline components (steps) need to be packaged up as Docker images. You can learn more in the official Kubeflow docs about building components: https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/

What is the mechanism for passing data between Pipeline steps?

Data moves between components via inputs and outputs, more info here: https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#designing-a-pipeline-component. All the data passed between the components must be serialized (to strings or files) so it can travel over a distributed network. The data must then be de-serialized for use in the downstream components. A practical example would be using a pickle file to move data between components.

How can we use the single image for multiple steps in the Kubeflow?

You can use a single image to encapsulate multiple steps, Note, that if you do, it will be seen as a single step by Kubeflow. For example if you combine a feature engineering step with a data load step in a single image, all of that work will happen in a single step. Put another way: image = component = step.