Last week we hosted our fourth “Data Science, Machine Learning and Kubeflow” Meetup. Special thanks to our awesome speakers Dr. Clair Sullivan and Alexander Aidun. In this blog post we’ll recap some highlights from the Meetup and preview what’s next. Ok, let’s dig in.
Join a Meetup near you
First, if you missed last week’s Meetup? No need to suffer from FOMO. Here’s a list of the Meetups that are part of the “Data Science, Machine Learning and Kubeflow” Meetup network. Please join the one that is the most time friendly to your location.
Get involved in the Kubeflow community
- Join Kubeflow Community Slack
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
- Would you like to be a co-organizer of a local Meetup?
If you answered yes to any of the above, Send one of the organizers/hosts a message on Meetup.com or jump onto Kubeflow Community Slack and DM @rawkintrevo
Thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give Meetup attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s workshop voting was Doctors Without Borders. They are an international humanitarian medical non-governmental organization of French origin best known for its projects in conflict zones and in countries affected by endemic diseases.. We are pleased to be making a donation of $250 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!
Talk #1: Machine Learning Enabled by Network Graphs: The Power of Connecting Your Data
Machine learning has traditionally revolved around creating models around data that is characterized by embeddings attributed to individual observations. However, this ignores a signal that could potentially be very strong: the relationships between data points. In this talk we will explore the power of using network graphs to analyze these relationships and how to take graphs to the next level with machine learning.
Resource Links from the Talk
- Dr. Sullican’s Twitter handle: @cjlovesdata
- Bite-Sized Neo4j for Data Scientists
- Graph Data Science Library
- Dr. Sullivan’s blogs on Medium
Dr. Clair Sullivan is currently a graph data science advocate at Neo4j, working to expand the community of data scientists and machine learning engineers using graphs to solve challenging problems. She has authored 4 book chapters, over 20 peer-reviewed papers, and more than 30 conference papers.
Talk: Introducing “Kubeflow Academy” Training and Certification
Arrikto is pioneering Kubeflow Ecosystem Education with a skills based approach to our development and associated curriculum programming. During this session Director of Education @Arrikto, Alex Aidun, will provide a high level grouping of our skill focus areas, highlight how users can develop new KF skills through our Academy and discuss some of the techniques we are using to ensure that students retain important information. Alex will close out the session with a look at our roadmap and exciting 2022 developments as well as our future badge and certification programs.
Resource Links from the Talk
- Kale 101: Transform Jupyter Notebooks into Kubeflow Pipelines
- Katib 101: HyperParameter Tuning via Jupyter Notebook, Kale & Katib
- Rok 101: Snapshotting and Restoring Kubeflow Pipelines
Alex is an energetic and tenacious problem-solving oriented education strategist, tactician and leader with insatiable appetite for both technology and helping people. He’s thrilled to be at Arrikto and working with passionate people on improving the lives of Data Scientists and the associated communities.
There was also one short lightning talk at the Meetup worth checking out.
- A 10 Minute Introduction to Kubeflow: Basics, Architecture & Components – Jimmy Guerrero, VP Developer Relations (Arrikto)
Questions and Answers
Here’s a recap of some of the Q&A during the Meetup edited for brevity and readability.
Where can I go to learn about how to get started with Kubeflow and Neo4j?
Check out: https://neo4j.com/labs/neo4j-helm/
Can Neo4j be used for a Bitcoin ledger so that the relations between Bitcoin transactions can be tracked?
Absolutely! Check out: https://neo4j.com/blog/import-bitcoin-blockchain-neo4j/
What should I consider when choosing a Kubeflow distribution?
You have two options – use a packaged distribution from a vendor or roll your own via a manifests based install. More info and details here.
Many ML engineers are lacking K8s knowledge and also its a cognitive load for them to learn about K8s to manage their models using Kubeflow. At the same time, many K8s engineers lack ML knowledge. So how do we address this mutual lack of perspectives on each side using KubeFlow?
One of the tools we are building to merge this gap is Kale. Kale allows Data Scientists to work in a familiar environment like Jupyter Notebooks, VS Code, or even plain ‘ol git repos. With simple annotations via a GUI or decorators on their existing code, they can convert their work to reproducible Kubeflow pipelines without having to containerize anything (and thus have to know about Docker), or write Kubeflow DSL pipeline code (and thus technically not have to know about Kubeflow). Finally, they can very easily spin up HP Tuning jobs and serve models without caring that Kubeflow or Kubernetes is underneath.
Is it possible to lift and shift my local notebook into Kubeflow and scale up the model training?
Yes. If it is a JupyterLab Notebook, you can start a JupyterLab server on Kubeflow and bring your existing Notebook in very easily.
How can I store the model artifacts in MLflow with every iteration?
Kale stores the model artifact metadata directly to MLMD. If you want to integrate MLflow instead, you would have to import the mlflow library in your code to log the artifacts and run MLflow along with Kubeflow.
Upcoming February 2022 Meetup
We are excited to announce that we have our speakers locked in for the next meetup.
February 3, 2022
- Robust Data Logging in Machine Learning – Danny Leybzon @WhyLabs
- Using Apache Spark in Kubeflow: A non-trivial Use Case – Trevor Grant @Arrikto
If you are new to Kubeflow – install MiniKF
MIniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS or GCP).
Here’s the links:
Get started with Kubeflow – hands-on tutorials
Installed but don’t know where to start? Get started with these hands-on, practical Kubeflow tutorials.
- Tutorial 1: An End-to-End ML Workflow: From Notebook to Kubeflow Pipelines with MiniKF & Kale
- Tutorial 2: Build An End-to-End ML Workflow: From Notebook to HP Tuning to Kubeflow Pipelines with Kale
- Tutorial 3: Build an ML pipeline with hyperparameter tuning and serve the model starting from a notebook
- Tutorial 4: Build an AutoML workflow starting from a notebook
FREE Kubeflow courses and certifications
We are excited to announce the first of several free instructor-led and on-demand Kubeflow courses! The “Introduction to Kubeflow” series of courses will start with the fundamentals, then go on to deeper dives of various Kubeflow components. Each course will be delivered over Zoom with the opportunity to earn a certificate upon successful completion of an exam. Visit us to learn more.
We hope to see you at a future Meetup!