Enterprises and organizations are discovering the power of cloud-based, distributed training using cloud-native Machine Learning solutions like Kubeflow. However, the vast majority of data science exploration and experimentation occurs on the laptops of data scientists and ML engineers.
A local laptop provides the fastest, most transparent, most comfortable experience for model development and experimentation to a data scientist. They can use all of their favorite tools on their own desktop with zero latency to explore, filter, manipulate, visualize their data, build an initial version of a model, and validate it by training with a small(er) dataset, locally.
If they bump into problems, they can use their favorite debugging tools to gain full insight into their code and its execution, right on their local machine.
However, after validating an initial version of their model, they do not have an easy way to continue their work on the cloud, taking advantage of distributed training and inference at scale, and eventually deploying their models in production; they currently have to repeat a big part of their work.
So the question is: can we merge the speed and transparency of local development without giving up the power of distributed training and inference that a cloud provides?
We introduce MiniKF, a production-ready, local Kubeflow deployment that installs in minutes, and understands how to downscale your infrastructure so you don’t burn your laptop. Then, with just a few clicks, you can start to experiment, and even run complete ML Pipelines. To train at scale, you can then move to a Kubeflow cloud deployment with the click of a button, without having to rewrite anything.
If you are a Kubernetes user, you can think of MiniKF being to Kubeflow, what Minikube is to Kubernetes.
We believe MiniKF will help democratize access to ML, lowering the barrier to entry, and allowing more people to spin up a pre-configured dev environment, so they can start experimenting locally, fast, and then scale-up to a full, distributed operation on the cloud. With scaling-up becoming the next step of a single workflow, rather than a completely distinct hassle.
We have been using MiniKF internally to experiment with Kubeflow, and has proved immensely helpful! We hope it will be helpful to others getting started with Kubeflow, as well.
MiniKF is available from today. Visit the download page to download it and start using it
We welcome your feedback, suggestions, bug reports, and feature requests. Please give it a try and tell us what you think!
We’d like to thank David Aronchick for providing insight, comments, and for reviewing this post.
Kubeflow is the machine learning toolkit for Kubernetes. It is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. The project’s goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow. For more information, visit www.kubeflow.org, follow @Kubeflow on Twitter, or join the discussion on Slack.
Arrikto creates software to transform how distributed applications discover and consume data on-prem or on the cloud. It empowers end users to iterate faster and easier, creating new collaboration workflows among teams. Arrikto is a core contributor to the Kubeflow project, mainly in the areas of data management and UX. For more information, visit www.arrikto.com, follow @Arrikto on Twitter, or join the discussion on Slack.