Intro to Kubeflow: Pipelines Training & Certification Recap – Jun 29, 2022

July 1, 2022

On Jun 29th we hosted the “Intro to Kubeflow: Pipelines Training and Certification prep course. In this blog post we’ll recap some highlights from the class, plus give a summary of the Q&A. Ok, let’s dig in!

Congratulations to Elena Ramos Varas!

The first attendee to earn the “Pipelines” certificate at the conclusion of the course was Elena Ramos Varas who works at BioLizard. A coupon code for the Arrikto Community Store is yours, well done!

First, thanks for voting for your favorite charity!

With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give workshop attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the National Pediatric Cancer Foundation (NPCF). The NPCF is a nonprofit organization dedicated to funding research to eliminate childhood cancer. We are pleased to be making a donation of $100 to them on behalf of the workshop attendees. Again, thanks to all of you who attended and voted!

What topics were covered in the course?

This initial course aimed to get data scientists and DevOps engineers with little or no experience familiar with the fundamentals of how Kubeflow works.

Kubeflow Fundamentals Review
Pipeline Basics and Concepts
Pipelines Architecture
Pipelines SDK and CLI
Navigating the Pipelines UI
Advanced Pipelines Topics
Getting Pipelines Up and Running
Pipelines Example: Kaggle’s Titanic Disaster Example
Pipelines Example: Udactity’s Dog Breed Computer Vision Example

What did I miss?

Here’s a short teaser from the 90 minute training. In this video we show you how to navigate the various Pipeline related views inside the Kubeflow UI after uploading a computer vision pipeline.

Missed the Jun 29 Kubeflow Pipelines training?

If you were unable to join us last week, you can sign up for upcoming Fundamentals, Notebooks, Pipelines and Kale/Katib courses here.

Alternatively, you can sign up for the next session on July 28 directly here.

NEW: Advanced Kubeflow, Kubernetes Basics, Notebooks and Pipelines Workshops

We are excited to announce a new series of FREE workshops focused on taking popular Kaggle and Udacity machine learning examples from “Notebook to Pipeline.” Registration is now open for the following workshops:

Jul 6: Introduction to Kubeflow: Katib and Kale Fundamentals
Jul 7: MLOps Meetup
Jul 13: The Kaggle Covid-19 OpenVaccine Machine Learning Example
Jul 14: Introduction to Kubeflow: Kubeflow Fundamentals
Jul 20: The Kaggle Facial Keypoints Detection Competition
Jul 21: Introduction to Kubeflow: Jupyter Notebooks Fundamentals

Arrikto Academy

If you are ready to put what you’ve learned into practice with hands-on labs? Then check out Arrikto Academy! On this site you’ll find a variety of FREE skills-building labs and tutorials including:

Kubeflow Use Cases: Kaggle OpenVaccine, Kaggle Titanic Disaster, Kaggle Blue Book for Bulldozers, Dog Breed Classification, Distributed Training, Kaggle Digit Recognizer Competition
Kubeflow Functionality – Kale, Katib
Enterprise Kubeflow Skills – Kale SDK, Rok Registry

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the course. [Edited for readability and brevity.]

Where can we find courses that give a good introduction to KubeFlow?

We recommend checking out one of our free 90 minute, instructor-led or on-demand “Introduction to Kubeflow” courses.

Do we need an orchestration tool like Airflow if we use Kubeflow?

No. Pipelines are built into the Kubeflow platform. Kubeflow Pipelines runs on top of Argo Workflows which gives you pipeline orchestration, experimentation and reusability.

What is the recommended way to build a pipeline?

Depending on your needs and skill level, you can build your pipeline using the KFP SDK and hand code the necessary Docker Images or use lightweight components. An option that eliminates a lot of boilerplate coding and levels up the automation, is to make use of the Kale JupyterLab extension.

What are lightweight components?

Lightweight components are Python function-based components that make it easier to iterate quickly by letting you build your component code as a Python function and have it generate the component specification for you. More info in the Docs.

How can we create a pipeline from scratch?

Best thing to do is start with the Pipelines SDK section of the Docs.

When you pass data between steps, do you incur the financial and time costs of writing to and reading from storage (e.g. a database or S3)? This could be very high for big data. Or can data be held in memory?

With Arrikto’s Kubeflow as a Service and EKF distribution we can mount the same Kubernetes PVCs across pipeline steps and thus not incur the financial and time costs of uploading and downloading data every time.

Can you disable step caching for a specified step? E.g. if you want to generate randomness (draw from a distribution, for example).

Yes. Check out the caching section of the Docs.

Any resources about the KFP DSL language?

Yes. Check out the Pipelines SDK section of the Docs.

Will Kale create a new custom image if you import custom packages in your components? I.e. packages which are not pip-installable?

If the packages are not pip-installable, you will have to manually install them under the home directory, if this is possible. If installed under the home directory, then Kale will be able to still operate without having to create new custom images. In the future Kale will support creating new images on the fly to cover the generic case of installing any package under any directory.