Data Management for Kubeflow

Rok enables versioned and reproducible data pipelines, empowering faster and easier collaboration among data scientists on-prem or on the cloud.


Store immutable versions of your whole environment along with its datasets. Roll back to any point in time, and instantly clone the preferred version. Start treating your data the way you are now treating your code.


Package everything together and add user-provided or automatically-generated metadata to your packages, so you or a colleague can deploy your whole machine learning environment to any other platform, running anywhere in the world. Instantly.


Keep track of your versions, their history and associations, and recreate your complete environment exactly the way it was at any point in time, without searching for missing outputs and lost temporary datasets. Enable end-to-end auditable processes for your work.


Make your whole environment reproducible and available to others working on different infrastructure anywhere in the world. A whole new way for teams to collaborate and iterate, faster and easier than ever.


Sync Faster! Sync your environments with your peers without needing to push to and pull data from a central repository. Feel the power and efficiency of the distributed, peer-to-peer Rok Network. The future is decentralized.


Your data and data science environments are shared over encrypted, point-to-point connections, which you can opt to run over your private network. Rok allows for encrypting all data at rest. Your data never crosses or gets stored at a central point.


Store only what has changed. Rok detects the parts that have remained unchanged between versions and only stores them once. This way, you make the most efficient use of your underlying storage capacity.


Rok runs everywhere, so you can continue using your laptop, any public cloud, or your existing on-prem virtualization or container platform. You can now be sharing across VMware, Kubernetes on-prem, EKS on AWS, or GKE on GCP.


Run fast! Run your data science environment on any kind of primary storage you like. Rok now makes it viable to run over super-fast, cost-efficient, ephemeral, local NVMe or traditional SSD storage. Rok sits on the side of your primary storage, not in the critical I/O path.

Use Rok and Rok Registry to share whole data science environments (code + libs + data) with hundreds of collaborators, across any location, on-prem or on the cloud.