How to run AutoML on Kubeflow with Kale

July 13, 2021

As the name suggests, the goal of automated machine learning (AutoML) is to automate as many of the tasks associated with machine learning as possible. In a perfect world, AutoML allows non-data science experts to make use of machine learning models and techniques and apply them to problems. Aside from making machine learning more accessible to non-experts, AutoML also has the advantage of creating solutions that are easier to understand, can be designed quickly and are pre-optimized vs those that are “hand-rolled” from scratch.

The tasks AutoML seeks to dramatically simplify include:

Data pre-processing
Feature engineering
Feature extraction
Feature selection
Algorithm selection
Hyperparameter tuning

Despite AutoML’s noble goals, there is still a lot of complexity inherent in making machine learning code that may run fine locally, scale in production. With the Kale project developed by Arrikto, data scientists can now easily run production-ready AutoML on Kubeflow with just a few clicks or decorations.

If you are new to the Kubeflow ecosystem, you may be asking yourself, “What exactly is Kale?”

Introducing Kale

Kale stands for “Kubeflow Automated PipeLines Engine.” It enables you to deploy Jupyter Notebooks that are running on your laptop or in the cloud to Kubeflow Pipelines, without requiring any of the Kubeflow SDK boilerplate. You can define pipelines just by annotating Notebook’s code cells and clicking a deployment button in the Jupyter UI. Kale takes care of converting the Notebook to a valid Kubeflow Pipelines deployment, plus resolving data dependencies and managing the pipeline’s lifecycle.

AutoML on Kubeflow with Kale in action!

In the video below, Stefano Fioravanzo, the original creator of Kale, shows you how to create an AutoML experiment with Kale and Kubeflow.

In the video he covers the following topics using an AutoML experiment:

Initial Jupyter Notebook definitions (data and goals)
Getting and running configurations
Monitoring
Snapshotting notebooks, configurations & pipelines
Identifying the best configuration
Hyperparameter tuning
Model section

For a bite-sized tutorial that you can try yourself, check out “Create an AutoML Workflow from Inside your Notebook.”

Schedule a FREE Kubeflow & MLOps workshop

We are excited to announce that Arrikto is making available a FREE, virtual 60 minute Kubeflow and MLOps workshop!
The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team and learn more.

About Arrikto

At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.4 release. Our projects/products include:

Kubeflow as a Service is the easiest way to get started with Kubeflow in minutes! It comes with a Free 7-day trial (no credit card required).
Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.