Did you miss the AutoML and Training working groups’ summit back in July? If you did, all the talks from the event have been uploaded to YouTube.
Reminder, if you attended the Summit, the organizers kindly ask you to complete this survey. Your answers will help the Kubeflow contributors!
This is part two in the blog series where we’ll give you an executive summary of the first batch of the day’s talks. Missed part one? Here it is.
If you are new to Kubeflow and AutoML
The Kubeflow project is organized into working groups with associated GitHub repositories, that focus on specific pieces of the ML platform. These include:
As the name suggests, the goal of automated machine learning (AutoML) is to automate as many of the tasks associated with machine learning as possible. In a perfect world, AutoML allows non-data science experts to make use of machine learning models and techniques and apply them to problems. Aside from making machine learning more accessible to non-experts, AutoML also has the advantage of creating solutions that are easier to understand, can be designed quickly and are pre-optimized vs those that are “hand-rolled” from scratch.
The tasks AutoML seeks to dramatically simplify include:
- Data pre-processing
- Feature engineering
- Feature extraction
- Feature selection
- Algorithm selection
- Hyperparameter tuning
As you can imagine, AutoML is “kind of a big deal” in the context of making Kubeflow accessible to experts and non-experts alike. This is where having a dedicated AutoML working group comes in. The working group’s chairs include:
- Andrey Velichkevich, Cisco
- Ce Gao, Caicloud
- Johnu George, Nutanix
The co-organizers of the Summit were the folks from the Training working group. This group covers developing, deploying, and operating training jobs on Kubeflow. The working group’s chairs include:
- Ce Gao, Caicloud
- Johnu George, Nutanix
- Yuan Tang, Ant Group
Ok, let’s look at a few previews of talks from the last half of the summit.
AutoML and Training Working Group Updates and Q&A Session
In this talk, Johnu George from Nutanix and Andrey Velichkevich from Cisco gave project updates on what the AutoML and Training working groups are up to.
- Updates to Training Operators including common changes and features
- Look at the new Katib UI going live in Kubeflow 1.4
- An overview of new algorithms that will be supported
- Components enhancements
- AutoML WG community updates
- The session concluded with some in-depth Q&A
Kubeflow in Meraki Vision
In this talk, Amit Saha from Cisco Meraki gave an overview of how they are using Kubelfow inside of the Meraki Vision cloud managed smart cameras.
- Overview of what Meraki Vision is
- The challenges with developing “intelligent” cameras
- An in-depth look at continuous model training and developer enablement
- An overview of Meraki specific requirements
- The configuration of their on-prem ML server
- Architecture and step-by-step workflow overview
- On-going pain points, especially around security
Kubeflow User Panel
In this talk, Kubeflow Community Product Manager, Josh Bottum moderated a panel of Kubeflow users. Panelists include David Yuan, Jobin Thomas, Charles Adetiloye, Umang Sharma and Forest Mars.
- What type of Kubeflow use cases does your team support?
- Which training operators do you use, need to use?
- Which hyperparameter tuning algorithms do you use, need to use?
- What issues do you have on configuration?
- What issues do you have on operations?
- How can the training operators be improved
- How can Katib be improved?
- How do you learn about the benefits of training operators?
- How do you learn about the benefits of Katib ?
- What type of training on Katib or Training operators would help your team?
Advanced Katib Features
In this talk, Andrey Velichkevich from Cisco dived-deep into some of the more advanced features in Katib.
- Early stopping
- Support for custom resources
- A TFJob example
- Landscape of AutoML
- Reinforcement learning in neural architecture search
- Differentiable architecture search
Kubeflow Universal Training Operator
In this talk, Jiaxin Shan from Tencent and Wang Zhang from Bytedance presented on the topic of distributed training mechanisms and the new universal training operator.
- Common distributed training mechanism
- Training operator implementation details
- A status report on the current training operator
- A look at the new Kubeflow universal training operator
Book a FREE Kubeflow and MLOps workshop
This FREE virtual workshop is designed with data scientists, machine learning developers, DevOps engineers and infrastructure operators in mind. The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team.
At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.4 release. Our projects/products include:
- MiniKF, a production-ready, local Kubeflow deployment that installs in minutes, and understands how to downscale your infrastructure
- Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
- Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
- Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.