Kubernetes AI Day North America 2021 Recap – Part 2

Did you miss CNCF’s Kubernetes AI Day at KubeCon North America 2021 back in October? If you did, you may have missed that all the talks from the event have been uploaded to YouTube. In part two of this two part blog series, we’ll give you an executive summary of the last five talks of the day.

Case Study: Developing and Scaling Kubeflow’s Web Apps

In this talk, Andrey Velichkevich of Cisco & Kimonas Sotirchos of Arrikto, gave a talk on how the Kubeflow community created true cloud native web apps, for managing ML workflows on top of K8s.

Talk Highlights

  • Overview of the personas who create and manage machine learning workflows and their UI needs
  • How Kubeflow can provide team isolation, scalability and a good user experience
  • A discussion about namespace isolation, authenticated users and SubjectAccessReviews for authz
  • A review of the current fetching strategy that uses exponential polling plus future improvements
  • An overview of the AutoML landscape
  • What is Katib?
  • Katib architecture overview and experiment example
  • Kubefow UI and AutoML demo

Defending Against Adversarial Model Attacks Using Kubeflow

Next up, Animesh Singh & Andrew Butler, both from IBM, examined how to build a pipeline that’s robust against adversarial attacks by leveraging Kubeflow Pipelines and integration with LFAI Adversarial Robustness Toolbox (ART). Additionally they showed how to test a machine learning model’s adversarial robustness in production on Kubeflow Serving, by virtue of Payload logging (KNative eventing) and ART.

Talk Highlights

  • An overview of the Linux FOundation AI and Data foundation
  • Adversarial threats to AI
  • An overview of Adversarial Robustness Toolbox (ART) plus demo
  • Discussion and demo concerning ART + Kubeflow Pipelines
  • Discussion and demo concerning ART + Kubeflow Serving

AIOps for CI with Kubeflow Pipelines

In this talk, Oindrilla Chatterjee & Aakanksha Duggal, both of Red Hat demonstrated how they developed a set of Jupyter Notebooks which are automated using Kubeflow Pipelines into a repeatable process that collected data from various CI/CD tools, calculated key performance indicator metrics, and performed analyses such as failure type prediction and build log clustering. These metrics were then displayed and explored as interactive dashboards.

Talk Highlights

  • What is AIOps for CI/CD
  • What is meant by “Operate First”
  • A review of the open data sources used, including GitHub, Prow and TestGrid
  • Demo!

A Better and More Efficient ML Experience for CERN Users

In this talk, Ricardo Rocha and Dejan Golubovic, bot of CERN presented how machine learning has been gaining momentum in the high energy physics (HEP) community and particularly at CERN, The talk emphasized how centralizing resources has improved their overall resource usage, how they extended existing functionality to manage end user tokens and credentials allowing access to on-premises storage, and how they explored tools like Harbor, Trivy, OPA and Falco to ensure a reproducible and secure flow from interactive analysis, to model training and finally serving.

Talk Highlights

  • Any overview of CERN and its research projects
  • How machine learning is leveraged in LHC data acquisition
  • How machine learning helps discover new physics
  • Why CERN chose Kubeflow
  • How Kubeflow is used at CERN
  • How resource usage was improved
  • An overview of improvements made to integrations, credential and namespace management, plus scans and runtime checks

Serving Machine Learning Models at Scale Using KServe

In this talk, Yuzhui Liu of Bloomberg, This talk presented the design of Multi-Model Serving, described how to use it to serve models for different frameworks, and shared benchmark stats that demonstrated its scalability.

Talk Highlights

  • A brief history of KFServing/KServe
  • The problem of production grade inference
  • Challenges that need to be solved in developing production grade inference
  • Introducing KServe
  • Overview of KServe Inference Protocol
  • What a single model KServe deployment looks like and its limitations
  • An introduction to ModelMesh, serving runtimes, performance and roadmap

If you missed part 1 of this blog series  you can find it here.

Book a FREE Kubeflow and MLOps workshop

This FREE virtual workshop is designed with data scientists, machine learning developers, DevOps engineers and infrastructure operators in mind. The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team.

About Arrikto

At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.4 release. Our projects/products include:

  • Kubeflow as a Service is the easiest way to get started with Kubeflow in minutes! It comes with a Free 7-day trial (no credit card required).
  • Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
  • Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
  • Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.

Free Technical Workshop

Turbocharge your team’s Kubeflow and MLOps skills with a free workshop.