Enterprises and organizations are discovering the power of cloud-based, distributed training using cloud-native Machine Learning solutions like Kubeflow. However, the vast majority of data science exploration and experimentation occurs on the laptops of data scientists and ML engineers.
A local laptop provides the fastest, most transparent, most comfortable experience for model development and experimentation to a data scientist. They can use all of their favorite tools on their own desktop with zero latency to explore, filter, manipulate, visualize their data, build an initial version of a model, and validate it by training with a small(er) dataset, locally.
If they bump into problems, they can use their favorite debugging tools to gain full insight into their code and its execution, right on their local machine.
However, after validating an initial version of their model, they do not have an easy way to continue their work on the cloud, taking advantage of distributed training and inference at scale, and eventually deploying their models in production; they currently have to repeat a big part of their work.
So the question is: can we merge the speed and transparency of local development without giving up the power of distributed training and inference that a cloud provides?
We introduce MiniKF, a production-ready, local Kubeflow deployment that installs in minutes, and understands how to downscale your infrastructure so you don’t burn your laptop. Then, with just a few clicks, you can start to experiment, and even run complete ML Pipelines. To train at scale, you can then move to a Kubeflow cloud deployment with the click of a button, without having to rewrite anything.
If you are a Kubernetes user, you can think of MiniKF being to Kubeflow, what Minikube is to Kubernetes.
We believe MiniKF will help democratize access to ML, lowering the barrier to entry, and allowing more people to spin up a pre-configured dev environment, so they can start experimenting locally, fast, and then scale-up to a full, distributed operation on the cloud. With scaling-up becoming the next step of a single workflow, rather than a completely distinct hassle.
We have been using MiniKF internally to experiment with Kubeflow, and has proved immensely helpful! We hope it will be helpful to others getting started with Kubeflow, as well.
We welcome your feedback, suggestions, bug reports, and feature requests. Please give it a try and tell us what you think!
We’d like to thank David Aronchick for providing insight, comments, and for reviewing this post.
For a smooth experience we recommend that your system meets the following requirements:
- 12GB RAM
- 2 CPUs
- 50GB disk space
Supported Operating systems
MiniKF runs on all major operating systems:
Before installing MiniKF, you need to have Vagrant and VirtualBox installed on your laptop.
Open a terminal on your laptop, create a new directory, switch into it, and run the following commands to install MiniKF:
vagrant init arrikto/minikf
MiniKF will take a few minutes to boot. When this is done, navigate to http://10.10.10.10 and follow the on-screen instructions to start Kubeflow and Rok.
Here are step-by-step instructions for upgrading from a previous version:
- Upgrade the MiniKF box to the latest version:
vagrant box update
- Ensure you have updated to the latest version:
vagrant box list
- Upgrade the
vagrant-persistent-storageplugin to v0.0.47 or later:
vagrant plugin update vagrant-persistent-storage
- Destroy the VM:
- Remove all local state. This will remove all of your customization in MiniKF (notebooks, pipelines, Rok snapshots):
- Re-create your VM:
End-to-end example on MiniKF
Notebooks & Kubeflow Pipelines on the new MiniKF. Run an e2e ML pipeline following this tutorial.