Last week we hosted a free Kubeflow and MLOps workshop presented by Kubeflow Community Product manager Josh Bottum. In this blog post we’ll recap some highlights from the workshop, plus give a summary of the Q&A. Ok, let’s dig in.
First, thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s workshop voting was Action Against Hunger. We are pleased to be making a donation of $250 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!
What topics were covered in the workshop?
- How to install Kubeflow via MiniKF locally or on a public cloud
- Take a snapshot of your notebook
- Clone the snapshot to recreate the exact same environment
- Create a pipeline starting from a Jupyter notebook
- Go back in time using Rok. Reproduce a step of the pipeline and view it from inside your notebook
- Create an HP Tuning (Katib) experiment starting from your notebook
- Serve a model from inside your notebook by creating a KFServing server
- An overview of what’s new in Kubeflow 1.4
What did I miss?
Here’s a short teaser from the 45 minute workshop where Josh walks us through a Kubeflow Pipeline execution graph with extra emphasis on how to create and work with data and artifact snapshots at every step.
In the workshop Josh discussed how MiniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS, GCP or locally). He also talked about the basic mechanics of installing MiniKF.
Here’s the links:
Although during the worksop, Josh focused primarily on the examples shown in tutorial #3 (which makes heavy use of the Open Vaccine Covid-19 example), we highly recommend to also try out tutorial #4 which does a great job of walking you through all the steps you’ll need to master, when bringing together all the Kubeflow components to turn your models into pipelines. You can get started with these hands-on, practical tutorials by following these links:
- Tutorial 1: An End-to-End ML Workflow: From Notebook to Kubeflow Pipelines with MiniKF & Kale
- Tutorial 2: Build An End-to-End ML Workflow: From Notebook to HP Tuning to Kubeflow Pipelines with Kale
- Tutorial 3: Build an ML pipeline with hyperparameter tuning and serve the model starting from a notebook
- Tutorial 4: Build an AutoML workflow starting from a notebook
Join the Kubeflow Community on Slack and make sure to add the #minikf channel to your workspace. The #minikf channel is your best resource for immediate technical assistance regarding all things MiniKF!
Missed the Dec 23 workshop?
If you were unable to join us last week but would still like to attend a workshop in the future you can sign up for the next workshop happening on Jan 20.
Links to Resources
For those that attended the workshop, here’s the resource links you need to replicate the exercises:
- Kubeflow Community Resources all in one place
- Install MiniKF
- Kubeflow Tutorials
- Find and join a local Kubeflow Meetup
- Upcoming training and certification preparation courses
Q&A from the workshop
Below is a summary of some of the questions that popped into the Q&A box during the workshop. [Edited for readability and brevity.]
Regarding scalability during the prediction process, if I have a huge data set, will Kubeflow be able to manage the auto-scaling part including splitting the data?
Auto-scaling typically refers to the infrastructure. which happens within the context of Kubernetes. Splitting the data set would be handled by the user, as different situations call for different methods. When thinking about how to scale your deployment you’ll also want to apply parallelism to every step possible by leveraging Pipelines’ ability to make use of public cloud infrastructure, GPUs and also run dozens of TensorFlow or similar instances of an ML framework in parallel. Kubeflow Pipelines allows you to create scalable Directed Acyclic Graphs (DAGs), which may have multiple steps that run in parallel.
Does Kubeflow only run on AWS?
Kubeflow is open source and can run on AWS, GCP, Azure, OpenShift, on-prem and even locally.
Do we have to wrap a REST API on models for microservices to invoke these models or do we have some automation done there as well?
You’ll need to expose your model via a REST API or web service in order for users, applications or other services to consume it.