Training and AutoML Summit Recap – Part 2

August 26, 2021

Blog and Kubeflow Updates | Kubeflow | News

Did you miss the AutoML and Training working groups’ summit back in July? If you did, all the talks from the event have been uploaded to YouTube.

Reminder, if you attended the Summit, the organizers kindly ask you to complete this survey. Your answers will help the Kubeflow contributors!

This is part two in the blog series where we’ll give you an executive summary of the first batch of the day’s talks. Missed part one? Here it is.

If you are new to Kubeflow and AutoML

The Kubeflow project is organized into working groups with associated GitHub repositories, that focus on specific pieces of the ML platform. These include:

AutoML
Deployment
Manifests
Notebooks
Pipelines
Serving
Training

As the name suggests, the goal of automated machine learning (AutoML) is to automate as many of the tasks associated with machine learning as possible. In a perfect world, AutoML allows non-data science experts to make use of machine learning models and techniques and apply them to problems. Aside from making machine learning more accessible to non-experts, AutoML also has the advantage of creating solutions that are easier to understand, can be designed quickly and are pre-optimized vs those that are “hand-rolled” from scratch.

The tasks AutoML seeks to dramatically simplify include:

Data pre-processing
Feature engineering
Feature extraction
Feature selection
Algorithm selection
Hyperparameter tuning

As you can imagine, AutoML is “kind of a big deal” in the context of making Kubeflow accessible to experts and non-experts alike. This is where having a dedicated AutoML working group comes in. The working group’s chairs include:

Andrey Velichkevich, Cisco
Ce Gao, Caicloud
Johnu George, Nutanix

The co-organizers of the Summit were the folks from the Training working group. This group covers developing, deploying, and operating training jobs on Kubeflow. The working group’s chairs include:

Ce Gao, Caicloud
Johnu George, Nutanix
Yuan Tang, Ant Group

Ok, let’s look at a few previews of talks from the last half of the summit.

AutoML and Training Working Group Updates and Q&A Session

In this talk, Johnu George from Nutanix and Andrey Velichkevich from Cisco gave project updates on what the AutoML and Training working groups are up to.

Talk Highlights

Updates to Training Operators including common changes and features
Look at the new Katib UI going live in Kubeflow 1.4
An overview of new algorithms that will be supported
Components enhancements
AutoML WG community updates
The session concluded with some in-depth Q&A

Kubeflow in Meraki Vision

In this talk, Amit Saha from Cisco Meraki gave an overview of how they are using Kubelfow inside of the Meraki Vision cloud managed smart cameras.

Talk Highlights

Overview of what Meraki Vision is
The challenges with developing “intelligent” cameras
An in-depth look at continuous model training and developer enablement
An overview of Meraki specific requirements
The configuration of their on-prem ML server
Architecture and step-by-step workflow overview
On-going pain points, especially around security

Kubeflow User Panel

In this talk, Kubeflow Community Product Manager, Josh Bottum moderated a panel of Kubeflow users. Panelists include David Yuan, Jobin Thomas, Charles Adetiloye, Umang Sharma and Forest Mars.

Talk Highlights

What type of Kubeflow use cases does your team support?
Which training operators do you use, need to use?
Which hyperparameter tuning algorithms do you use, need to use?
What issues do you have on configuration?
What issues do you have on operations?
How can the training operators be improved
How can Katib be improved?
How do you learn about the benefits of training operators?
How do you learn about the benefits of Katib ?
What type of training on Katib or Training operators would help your team?

Advanced Katib Features

In this talk, Andrey Velichkevich from Cisco dived-deep into some of the more advanced features in Katib.

Talk Highlights

Early stopping
Support for custom resources
A TFJob example
Landscape of AutoML
Reinforcement learning in neural architecture search
Differentiable architecture search
Demo!

Kubeflow Universal Training Operator

In this talk, Jiaxin Shan from Tencent and Wang Zhang from Bytedance presented on the topic of distributed training mechanisms and the new universal training operator.

Talk Highlights

Common distributed training mechanism
Training operator implementation details
A status report on the current training operator
A look at the new Kubeflow universal training operator

Book a FREE Kubeflow and MLOps workshop

This FREE virtual workshop is designed with data scientists, machine learning developers, DevOps engineers and infrastructure operators in mind. The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team.

About Arrikto

At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.4 release. Our projects/products include:

Kubeflow as a Service is the easiest way to get started with Kubeflow in minutes! It comes with a Free 7-day trial (no credit card required).
Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.