Almost all companies who have embraced machine learning in recent years have come to a startling realization that no matter how many data scientists or software they throw at a project, most models fail to make it to production or deliver real business value. Why is this?
There are many reasons why, but one we often hear in speaking with customers revolves around the complexity inherent to making machine learning code that may run fine locally, scale in production. With the Kale project developed by Arrikto, data scientists can now easily create production-ready Kubeflow pipelines for MLOps with just a few clicks or decorations.
If you are new to the Kubeflow ecosystem, you may be asking yourself, “What exactly is Kale?”
What is Kale?
Kale stands for “Kubeflow Automated PipeLines Engine”. It enables you to deploy Jupyter Notebooks that are running on your laptop or in the cloud to Kubeflow Pipelines, without requiring any of the Kubeflow SDK boilerplate. You can define pipelines just by annotating Notebook’s code cells and clicking a deployment button in the Jupyter UI. Kale takes care of converting the Notebook to a valid Kubeflow Pipelines deployment, plus resolving data dependencies and managing the pipeline’s lifecycle.
What is the Kale SDK?
The aim of the Kale SDK is to allow you to write plain Python code and then be able to convert it to fully reproducible Kubeflow pipelines without making any change to the original source code.
All you have to do is to decorate the Python functions that will become pipeline steps, and decorate the single function that acts as the main entry point.
See the Kale SDK in Action
Check out this video with Stefano Fioravanzo, the original creator of Kale, who will take you on a quick tour of the Kale SDK and how to use it to convert your Python code in a scalable Kubeflow pipeline.
In the video he covers the following topics:
- Review a simple project that loads some CSV files, does some data processing and makes assorted function calls
- A walk through of a models module that implements three trainings including Random Forest, Support Vector Classifier and Logistic Regression
- Finally, how with a few decorations, you can take the same simple project and massively scale it, snapshot it so it can be easily reproduced, plus take advantage of parallel processing
At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.3 release. Our projects/products include:
- MiniKF is a production-ready, local Kubeflow deployment that installs in minutes, and understands how to downscale your infrastructure
- Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
- Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
- Kale is a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly…