Kubeflow Fundamentals Part 3: Distributions and Installations

Kubeflow Fundamentals: Distributions and Installations

Welcome to the third blog post in our “Kubeflow Fundamentals” series specifically designed for folks brand new to the Kubelfow project. The aim of the series is to walk you through a detailed introduction of Kubeflow, a deep-dive into the various components and how they all come together to deliver a complete MLOps platform.

If you missed the previous installments in “Kubeflow Fundamentals” series, you can find them here:

Part 1: An Introduction
Part 2: Machine Learning Workflows

In this post we’ll take a look at the different Kubeflow distributions that are available and walk you through some installations using MiniKF. Ok, let’s dive right in!

Installing Kubeflow

If you are ready to install Kubeflow, then the first decision to make is how you want to get up and running. Aside from building everything from source, you really have two options:

  • Install via a packaged distribution
  • Install via manifests

The benefit of packaged distributions is that they come bundled with all the correct versions of software you need and the integrations are tested and maintained by a vendor. Combined with documentation, this typically translates into a high probability that you’ll be up and running on the platform the distribution was designed to work on, without much hassle. For developers new to Kubeflow or with limited Kubernetes experience, this is the recommended option.

The second option you have to get up and running is to manually install everything via manifests.

Wait, what are “manifests”?

When deploying to Kubernetes  or creating resources like a pod, replica-set, configmap, etc you’ll need to create a file called “manifest” that describes the object and its attributes either in YAML or JSON. Put another way, it is the specification of a Kubernetes API object in JSON or YAML format.

Please note that this method is for advanced users that for the most part will need to support themselves. To learn more about installing Kubeflow via manifests, check out the Docs.

So, because this blog series is aimed at data scientists and developers new to Kubeflow and likely just as new to Kubernetes, we will focus on getting up and running via the packaged distributions.

Packaged Kubeflow Distributions

As of this blog’s writing, the following packaged Kubeflow distributions are available for the following platforms.

AWS

  • Kubeflow on AWS, maintained by Amazon Web Services
  • Arrikto Enterprise Kubeflow on EKS, maintained by Arrikto
  • MiniKF, maintained by Arrikto

Google Cloud

  • Kubeflow on Google Cloud, maintained by Google Cloud
  • Arrikto Enterprise Kubeflow on GKE, maintained by Arrikto
  • MiniKF, maintained by Arrikto

Azure

  • Kubeflow on Azure, maintained by Microsoft Azure
  • Arrikto Enterprise Kubeflow on AKS, maintained by Arrikto

Other Platforms with Kubeflow Packaged Distributions

  • IBM Cloud Kubernetes Service (IKS)
  • Nutanix Karbon
  • OpenShift
  • Conformant Kubernetes
  • Vagrant
  • MicroK8s

For the purposes of this blog we are going to focus on getting up and running with MiniKF. Why?

  • MiniKF is the easiest distribution to get started with, even for folks with limited Kubernetes experience
  • MiniKF is cross platform. It runs on AWS, GCP and even locally via Vagrant
  • MiniKF comes with prebundled add-ons like Kale and Rok that make it much easier to build pipelines and manage data then the basic Kubeflow distribution offers.

MiniKF on AWS

Getting up and running with MiniKF on AWS is very straightforward. The basic steps are:

  • Launch a MiniKF AMI from the AWS Marketplace
  • Configure the instance for MiniKF
  • Deploy all the necessary components including Kale and Rok
  • Bring up the Kubeflow UI to start your first project

Check out the short video below to see just how easy an installation of MiniKF on AWS is.

Note that you should budget about 30 mins to complete the installation. Because Kubeflow requires 40+ pods and can be resource intensive, you’ll need access to sufficient resources on AWS which means you won’t be able to get away with just the free tier that AWS offers. You’ll need access to at least an m5.2xlarge instance type. For complete MiniKF on AWS installation instructions, check out the Docs.

MiniKF on Google Cloud

Just like AWS, getting up and running with MiniKF on GCP is very straightforward. The basic steps are:

  • Launch a MiniKF VM on GCP
  • Deploy all the necessary components including Kubernetes (via Minikube), Kubeflow, Kale and Rok
  • Bring up the Kubeflow UI to start your first project

Check out the short video below to see just how easy an installation of MiniKF on GCP is.

Like the AWS installation, budget about 30 mins to complete the installation. You’ll also need access to sufficient resources on GCP, which means you won’t be able to get away with just the free tier. You’ll need access to at least an n1-standard-8 machine which should give you 8 vCPUs and 30GBs of RAM. For complete MiniKF on GCP installation instructions, check out the Docs.

MiniKF on Vagrant

Although, we highly recommend the AWS and GCP options, if you really must install Kubeflow locally on your Linux, MacOS or Windows laptop, you do have the option using the MiniKF on Vagrant distribution. Check out the short video below to see just how easy an installation of MiniKF on Vagrant is.

Like AWS and GCP, budget about 30 mins to complete the installation. You’ll also likely need to do a little clean up on your laptop to free up enough resources for Kubeflow. You are going to need:

  • 12 GB RAM
  • 2 CPUs
  • 50 GB of disk space
  • Vagrant and VirtualBox installed

For complete MiniKF on Vagrant installation instructions, check out the Docs.

What’s next? Part 4 – External Tools and Add-ons

Stay tuned for the next blog in this series where we’ll focus on getting a little more familiar with Kubeflow external tools and add-ons including Istio, Kale, Rok and tools for serving.

Book a FREE Kubeflow and MLOps workshop

This FREE virtual workshop is designed with data scientists, machine learning developers, DevOps engineers and infrastructure operators in mind. The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team.

About Arrikto

At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.3 release. Our projects/products include:

  • MiniKF, a production-ready, local Kubeflow deployment that installs in minutes, and understands how to downscale your infrastructure 
  • Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
  • Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
  • Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.

MiniKF is the simplest way to get started with Kubeflow and Rok on any platform

Turbocharge your team’s Kubeflow and MLOps skills with a free workshop