Data management for hybrid/multi-cloud Kubernetes

May 22, 2019

Kubernetes simplifies the way people build and deploy scalable, distributed applications on-prem and on the cloud. Moreover, when your apps run inside containers it doesn’t really matter whether they run on a public cloud, or on-prem bare metal machines. It is exactly the same, everywhere. In other words, Kubernetes runs on any infrastructure, and the user can take advantage of the same orchestration tools for all their different environments. This cross-platform K8s compatibility avoids infrastructure and cloud provider lock-in. For the first time, it makes hybrid and multi-cloud strategy viable and easy.

While all this application portability thing sounds exciting, we argue that one important part is still missing. To consider the hybrid- and multi-cloud journey complete, one should also solve the data gravity problem. Only then can we talk about true multi-cloud strategy and application portability across locations and environments.

Let’s go through the main objectives of data management and the current state on Kubernetes.

Data protection

Data protection is crucial when running applications in production. In the enterprise world, it is of vital importance to be able to backup and restore entire applications along with their data, as well as to recover quickly from disasters.

The main features of data protection are:

Local snapshots
Backup / Restore
Offsite backups
Disaster recovery

Most of the above functionality is missing for stateful applications running on Kubernetes. There is no clear separation of the role of primary and secondary storage. IT people try to solve the data protection problem by driving primary storage to take snapshots and handle their archival on an object storage service.

However, this approach is not efficient, as primary storage is not designed to handle large number of snapshots, and it becomes slow when storing a large number of snapshots. Moreover, pushing snapshots to object storage for archival and snapshot restoring impacts the performance of other applications served by the same primary storage.

The most significant drawback of having the primary storage handling data protection is that the same primary storage product needs to be running on every location/cloud/region/zone to cater for restoring. We argue that this is a major limitation in the cloud native era. It leads to vendor lock-in and doesn’t align with the Kubernetes portable mentality and design.

Data portability

Data portability is a promise, which frees the application to run everywhere, independently of infrastructure, enabling new economics for businesses. In the enterprise world, the hybrid- and multi-cloud strategy is already a reality, thus application portability, which depends on data portability rises as the next logical need.

The main use cases of data portability are:

Application mobility across local K8s clusters
Application migration to the cloud

Currently, the solutions that try to solve these use cases treat application portability as a single export/import operation. They push/export all data to a shared location (usually an object storage service), and then the receiving end pulls/imports the data from this shared location.

This approach is very painful in terms of speed and bandwidth. Moreover, a single administrator needs to have access to all Kubernetes clusters, plus the single object storage service, which makes the deployment a single trust domain. In addition, the same primary storage needs to be present on all locations to be able to import the data.

Treating application portability as a single, one-off import/export operation is a very old paradigm, unaligned with the cloud native world. We argue that an application should move painlessly between locations and clouds, independently of the underlying infrastructure and primary storage. At the same time, application portability should not depend on a single operator/administrator.

Copy data management

Organizations within a business need to share data efficiently and securely. They need to be able to collaborate on different copies of the same stateful application instance. Although sharing data (and applications) increases productivity, it also raises a lot of governance and compliance issues. Effective and secure copy data management requires tools that can create, transform, anonymize, distribute and track the copies provided to different teams.

The main use cases of copy data management are:

Analytics/BI teams producing reports on production’s database data
Developers running tests with real data for debugging
Legal teams performing compliance or auditing sensitive data

Currently, in Kubernetes there is no easy way to provide copy data management, since traditionally this functionality is provided by secondary storage vendors. People work around this, by exploiting the snapshotting functionality of primary storage.

However, this approach bumps into the same problems described earlier. Primary storage becomes slow, the performance of applications is affected, and one becomes locked-in, and dependent to the primary storage vendor.

Security and access management is of significant importance in copy data management use cases, since most of the times completely untrusted teams need to work on the different copies. These teams should be able to share immutable copies of their data across administrative domains both securely, but also as easy as syncing files.

Future state

For all the above reasons, we believe that a next generation secondary storage solution should be responsible for the data management part. This is why we designed the Rok data management platform, to sit on the side of primary storage and provide the enterprise-grade data services, which are currently missing from cloud native applications.

Rok integrates on the side of primary storage providing incremental snapshots in a group-consistent manner. These snapshots can be distributed across multiple, completely isolated locations that may be backed by different primary storage and/or object storage services. There, a user can recover the whole application with near-zero RPO and near-zero RTO.

Rok is the first solution providing enterprise-grade secondary storage functionality, designed for Kubernetes. Thus, the option of replacing traditional primary storage, with cheap, ephemeral, local SSD/NVMe storage, becomes viable for the first time, with unparalleled benefits.

Rok deployed next to local NVMe brings the best of both worlds for running stateful, cloud native applications. Rok fortifies the ephemeral nature of locally attached storage (SSD/NVMe), by being able to restore a snapshot of the local SSD/NVMe onto any other node of a Kubernetes cluster instantly.

Since new age, cloud native apps take care of consistency at the application level, restoring from a snapshot that was taken a few minutes back in time is now a viable trade-off, which brings significant advantages:

Unparalleled performance (millions of IOPS, microsecond latency)
Infinite scale-out (storage resources are completely disaggregated, no pooling)
Free scheduling and movement of stateful containers by Kubernetes to any node of the cluster (without the need of an underlying shared block or file storage storage solution)

We believe this approach proposes a compelling new architecture for backing stateful workloads, which was not possible until now. An architecture that is consistent and aligned with the cloud native nature of modern applications and the new hybrid/multi-cloud Kubernetes world.