Last week we hosted our second “Data Science, Machine Learning and Kubeflow” Meetup. Special thanks to our awesome speakers Salman Iqbal, Stefano Fioravanzo and Alexander Aidun. In this blog post we’ll recap some highlights from the Meetup and preview what’s next. Ok, let’s dig in.
Join a Meetup near you
Missed last week’s Meetup? No need to suffer from FOMO. Here’s a list of the Meetups that are part of the “Data Science, Machine Learning and Kubeflow” Meetup network. Please join the one that is the most time friendly to your location.
Get involved in the Kubeflow community
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
- Would you like to be a co-organizer of a local Meetup?
If you answered yes to any of the above, Send one of the organizers/hosts a message on Meetup.com or jump onto Kubeflow Community Slack and DM @Jimmy Guerrero
Thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give Meetup attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s Meetup voting was the Cancer Research Institute. CRI is a US non-profit organization funding cancer research and based in New York City. They were founded in 1953 to develop immunologically-based treatments for cancer. We are pleased to be making a donation of $250 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!
Comparing Apache Spark on Kubernetes to Kubeflow
We all know that Kubeflow is great for a lot of data science and machine learning problems, but is it always the best choice? In this talk, Salman Iqba, who works as an MLOps Engineer at Appvia and a Kuberenetes Instructor at Learnk8s, takes a contrarian stance and looks at situations where alternatives like Apache Spark might be a better fit. Spark can perform machine learning tasks very quickly on large data sets. Kubernetes is an extensible platform for managing and orchestrating containers and services across cluster of multiple machines. In this talk you will see why running Spark on Kubernetes can be a winning combination for certain use cases.
You will learn what Spark is and how it utilises capabilities of Kubernetes to perform its tasks in a scalable and efficient manner. With the help of live demos you will also run how to install & manage Spark on Kubernetes cluster and how to submit and monitor Spark applications on Kubernetes.
A Complete Introduction to Kale
In this presentation, Stefano Fioravanzo – original creator of Kale, will take you on a tour of the open source Kale project for Kubeflow. Kale enables you to deploy Jupyter Notebooks that are running on your laptop or in the cloud to Kubeflow Pipelines, without requiring any of the Kubeflow SDK boilerplate. You can define pipelines just by annotating Notebook’s code cells and clicking a deployment button in the Jupyter UI. Kale takes care of converting the Notebook to a valid Kubeflow Pipelines deployment, plus resolving data dependencies and managing the pipeline’s lifecycle. In this talk Stefano will also highlight the Kale SDK and AutoML.
There were also two short lightning talks at the Meetup worth checking out.
- A 10 Minute Introduction to Kubeflow: Basics, Architecture & Components – Jimmy Guerrero, VP Developer Relations (Arrikto)
- Preview of Arrikto Academy / Kubeflow Education – Alexander Aidun, Dir of Education (Arrikto)
Questions and Answers
Here’s a recap of some of the Q&A during the Meetup edited for brevity and readability.
Is the new version of KALE going to be announced ?
We don’t yet have a release date set for the next Kale release. Best bet is to follow us on LinkedIn or track the project on GitHub to be alerted when the new release is available.
Any idea why kubeflow/metadata has not been updated in awhile?
Kubeflow has moved completely to using ML Metadata as its metadata store. As far as we know kubeflow/metadata is not actively maintained anymore.
How does Kale choose a docker image for each pipeline step?
Kale uses the Notebook’s image for all steps and it mounts PVCs automatically that include all the libraries and data needed for the step to run, since it has snapshotted everything before building the pipeline. So, you don’t have to build and push Docker images at all.
Is Kale a plug-in for Jupyiter or a separate container?
Kale is an SDK, not a separate container. It also provides a GUI as a JupyterLab extension, so you can use it natively from inside JupyterLab.
Do you have Kubeflow series tutorial?
Three things that might be worth pursuing:
What are the advantages of running Spark on Kubernetes vs a normal cluster?
Auto-scaling, quick deployment, auto restarts, orchestration, independence from infrastructure and more!
Upcoming November and December Meetups
We are excited to announce that we have our speakers locked in for the next two upcoming Meetups. Here’s a quick preview.
November 4, 2021
- Octant Kubeflow: Debug and Manage Kubernetes from a GUI – Liam Rathke (VMWare)
- Kubeflow on OpenShift – Noelle Silver (Red Hat)
December 2, 2021
- Istio Service Mesh 101 – Peter Jausovec (Tetrate.io)
- Orchestrating Apache Spark with Kubeflow on Kubernetes -Sadik Bakiu (Data Max)
If you are new to Kubeflow – install MiniKF
MIniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS, GCP or locally).
Here’s the links:
Get started with Kubeflow – hands-on tutorials
Installed but don’t know where to start? Get started with these hands-on, practical Kubeflow tutorials.
- Tutorial 1: An End-to-End ML Workflow: From Notebook to Kubeflow Pipelines with MiniKF & Kale
- Tutorial 2: Build An End-to-End ML Workflow: From Notebook to HP Tuning to Kubeflow Pipelines with Kale
- Tutorial 3: Build an ML pipeline with hyperparameter tuning and serve the model starting from a notebook
- Tutorial 4: Build an AutoML workflow starting from a notebook
FREE Kubeflow courses and certifications
We are excited to announce the first of several free instructor-led and on-demand Kubeflow courses! The “Introduction to Kubeflow” series of courses will start with the fundamentals, then go on to deeper dives of various Kubeflow components.. Each course will be delivered over Zoom with the opportunity to earn a certificate upon successful completion of an exam. To learn more, sign up for the first course.
We hope to see you at a future Meetups!