Last week we hosted a free Kaggle Titanic Disaster Machine Learning workshop. In this blog post we’ll recap some highlights from the workshop, plus give a summary of the Q&A. Ok, let’s dig in.
First, thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give workshop attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the National Pediatric Cancer Foundation (NPCF). The NPCF is a nonprofit organization dedicated to funding research to eliminate childhood cancer. We are pleased to be making a donation of $100 to them on behalf of the workshop attendees. Again, thanks to all of you who attended and voted!
What topics were covered in the workshop?
- Overview of Kubeflow
- Installing Kubeflow
- About the Titanic Disaster Example
- Deploying a Notebook Server
- Getting the Titanic Disaster Example Up and Running
- Exploring the Notebook
- Deploying and Running a Pipeline
- Examining the Results
What did I miss?
Here’s a short teaser from the 45 minute workshop where I walked folks through how to get the Titanic Disaster notebook up and running on a Kubeflow cluster.
Deploy Kubeflow with Kubeflow as a Service
In the workshop we showed how Kubeflow as a Service is the easiest way to get started with Kubeflow. Want to try it for yourself? Sign up here: kubeflow.arrikto.com
Try the Titanic Disaster Tutorial for Yourself
If you’d like to work through the Titanic Disaster notebook yourself, check out the guided tutorial on Arrikto Academy.
Missed the June 16th workshop?
If you were unable to join us this week, but would still like to attend a workshop in the future, register for one of these upcoming workshops.
Q&A from the workshop
Below is a summary of some of the questions that popped into the Q&A box during the workshop. [Edited for readability and brevity.]
How can I expose a model from my laptop or desktop?
This is not advised as it is technically complicated and not a reflection of an actual staging or production environment. The best approach is to use a cloud based option like Kubeflow as a Service.
Pipelines are hosted in ephemeral pods, how do they access the data that is hosted on the original storage volume?
Everything is running in Kubernetes – the Notebook Server, the Jupyter Notebook, the Kubeflow Pipeline. In the background there are applications like Minio and Istio that are responsible for making sure that the pods that need to communicate can do so. Additionally the initial environment to create the pipeline and the subsequent output of each step are being snapshotted. Each snapshot is then loaded as the foundation for the container for the next step. So not only can the pods communicate as necessary, the output is snapshotted and loaded for subsequent steps. In this way the data is transferred AND there is a lineage / audit history.
Can I make custom docker images for Notebook Servers?
Kubeflow open source documentation has details on precompiling docker images for use in notebook servers, make sure you do not overwrite but you only add.