Securing your Kubeflow deployment with Kyverno policies

September 22, 2022

We all know Kubernetes is awesome, and Kubeflow makes Kubernetes cool for machine learning (ML) teams. With Kubeflow, data scientists and ML engineers can share infrastructure and accelerate the delivery of ML models, while minimizing costs.

When multiple actors interact with shared infrastructure, security becomes a priority for IT. There is always “that one guy”, who will try to circumvent security to show how clever they are, right?

Especially in a platform like Kubeflow, where users are allowed to create and delete resources as they wish, a system that allows them fine-grained permissions on what they can and cannot do is critical.

In this post, we will explore how we achieved that using Kyverno.

What is the Current State of Security in Kubeflow?

Currently, Kubeflow is restricting users’ access to sensitive resources by using RBAC authorization rules. With RBAC rules, each user is confined inside their dedicated namespace and, therefore, cannot access or modify sensitive system resources and data.

But, even having permission to manage resources in one namespace can be proven disastrous for the security of the whole cluster.

Examples of malicious usage (part 1)

Let’s wear our black hats and get our hands dirty, shall we?

In the following example, we will demonstrate how a user can completely PWN a Kubernetes node in 3 simple steps.

What we will need:

Access to a Kubeflow deployment with the `kubeflow-admin` role
That’s all

Step 1: Create a Notebook

Step 2: Create a malicious Pod

Using the Notebook, we create a new YAML file:

We open a new terminal and apply this resource:

jovyan@pwner-0:~$ kubectl apply -f pwn.yaml pod/pwn configured

Step 3: Gain access

Now we simply open a terminal in the Pod we just created:

jovyan@pwner-0:~$ kubectl exec -it pwn -- bash [root@node /]#

And voila! Root access in the Node.
We can “rm -rf /*” to punish that pesky IT admin we don’t like and ruin their weekend (please don’t …).

Summary

The previous example is not optimal for the cluster’s security, as we can acquire full root access in the node in a matter of minutes without obstacles.

We can do better!

Available Options

So, we need to beef up the cluster’s security. Let’s explore the available options to do that:

There are three major security solutions for Kubernetes:

Pod Security Policy (PSP)
Pod Security Admission (PSA)
Kyverno
Gatekeeper/OPA

Pod Security Policy (PSP)

Pod security policies are the first built-in Kubernetes method for fine-grained authorization of Pods creation. With PSPs, the admin can prevent the creation of Pods based on a set of conditions (e.g., don’t allow privileged containers. See the official documentation for all the possible rules: https://kubernetes.io/docs/concepts/security/pod-security-policy). We could use that to improve the security of our cluster. Still, PSPs come with some significant drawbacks:

The policies are applied only on Pods and not on other types of resources, e.g. ingresses.
The policies are applied to service accounts using RBAC rules and not directly to Pods. This restriction overcomplicates large deployments where many system service accounts come into play.
Pod Security Policies are deprecated as of Kubernetes v1.21 and will be removed in v1.25. We don’t want to invest in a technology that has an expiration date.

Pod Security Admission (PSA)

Pod Security Admission (PSA) is the successor to PSPs. Similar to PSPs, the admin can restrict the creation of Pods with the difference that the rules are applied per namespace. Still, there are some drawbacks:

Again, only for Pods validation.
Not very mature.
Not very configurable.
Enabled by feature gate (or manually installing the admission webhook).

Kyverno

Kyverno is a policy engine designed specifically for Kubernetes from the ground up. It is an open-source project developed by Nirmata and donated to CNCF, currently in incubating state. Kyverno works by using a dynamic admission controller to:

Validate resources
Mutate resources
Generate resources

by defining policies in YAMLs, as Kubernetes resources. This looks promising (you read the title, you know where this is going :-)) …

Gatekeeper/OPA

OPA (Open Policy Agent) is a graduated CNCF project. It is a general-purpose policy engine (not Kubernetes specific), and you can use its high-level language to define policies. Gatekeeper is a dynamic admission controller that uses OPA as a policy engine for Kubernetes resources. Using native Kubernetes CRDs, you can define policies that:

Validate resources.
Mutate resources.
Generate resources.

The high-level DSL of OPA makes it a very versatile tool but a little hard to configure.

Summary

The following table shows a summary of the pros and cons of each solution:

	Pros	Cons
PSP/PSA	• Official Kubernetes security solution	• Only for Pods validation • Not very configurable
Kyverno	• Versatile policy engine • Easy to configure • Designed for Kubernetes	• Relatively new • Not able to write very complex policies
Gatekeeper/OPA	• Capable of very complex policies • Mature	• Hard to configure

Why we chose Kyverno

We chose Kyverno for the policy engine in our cluster to strike a balance between versatility and ease of use. Due to the fluid nature of a Kubeflow deployment, we want every administrator to be able to understand and manage the security policies and customize them based on their needs.

Setting up Kyverno

Step1: Install

Kyverno provides 2 methods of installation:

Helm
YAMLs

We choose to go with the simple YAMLs installation. To install Kyverno, run:

user@localhost:~$ kubectl create -f

https://raw.githubusercontent.com/kyverno/kyverno/release-1.7/config/release/install.yaml

To validate that everything is up and run:

user@localhost:~$ kubectl get pods -n kyverno NAME READY STATUS RESTARTS AGE kyverno-5cc856f997-hxxbq 1/1 Running 0 28h

Step 2: Configure policies

Now to the fun part!

We will leverage the capabilities of Kyverno policies to restrict malicious users’ ability to take over our cluster.

Kyverno provides a library of sample policies that we can use and prevent common attack vectors (https://kyverno.io/policies).

In this example, we are going to configure 2 policies:

The first policy that we will apply prevents the users from creating a Pod that shares the namespaces with the host.

More specifically a user cannot create a Pod that has the hostPID, hostNetwork or hostIPC set to true, e.g:

Create a new `disallow-host-namespaces.yaml` file with the following content:

And apply it:

user@localhost:~$ kubectl apply -f disallow-host-namespaces.yaml
clusterpolicy.kyverno.io/disallow-host-namespaces created

(source: https://kyverno.io/policies/pod-security/baseline/disallow-host-namespaces/disallow-host-namespaces/)

2) Prevent users from creating Pods with privileged containers.

Now we will create a policy that prevents users from creating Pods that contain privileged containers, e.g:

Create a new `disallow-privileged-containers.yaml` file with the following content:

And apply it:

user@localhost:~$ kubectl apply -f disallow-privileged-containers.yaml
clusterpolicy.kyverno.io/disallow-privileged-containers created

(source: https://kyverno.io/policies/pod-security/baseline/disallow-privileged-containers/disallow-privileged-containers/)

Finally verify that the policies are ready:

user@localhost:~$ kubectl get clusterpolicy
NAME                             BACKGROUND   ACTION    READY
disallow-host-namespaces         true         enforce   true
disallow-privileged-containers   true         enforce   true

Now we are ready!

Examples of malicious usage (part 2)

Let’s revisit the previous example where the user gained access to our node by creating a privileged Pod.

Open your Notebook and try to create the same Pod as we created before:

jovyan@pwner-0:~$ kubectl apply -f pwn.yaml 
Error from server: error when creating "pwn.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Pod/kubeflow-user/pwn was blocked due to the following policies

disallow-host-namespaces:
  host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
    fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
    `false`.          . Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privileged-containers:
  privileged-containers: 'validation error: Privileged mode is disallowed. The fields
    spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
    must be unset or set to `false`.          . Rule privileged-containers failed
    at path /spec/containers/0/securityContext/privileged/'

Something different happened this time! The privileged Pod was not allowed and `kubectl` returned a validation error explaining what was wrong with the manifest.

Conclusions

Kubeflow is a very versatile platform, but from a security perspective, improvements are needed. In this post, we saw an example of how we utilize Kyverno at Arrikto Enterprise Kubeflow to improve the security of our cluster in a multi-user environment.

Stay tuned for part 2, where we explore some caveats of Kyverno and the configuration we needed to make the integration as seamless as possible.

About the author

Ioannis is a Software Engineer at Arrikto, primarily working on Cloud Solutions and Kubernetes. He loves exploring new technologies and tinkering with everything that can be tinkered with. Previously worked as a backend developer for enterprise solutions as well as contributed to various open source projects.

About Kubeflow

Kubeflow is an open source, cloud-native MLOps platform originally developed by Google that aims to provide all the tooling that both data scientists and machine learning engineers need to run workflows in production. Features include model development, training, serving, AutoML, monitoring and artifact management.

Kubeflow is the open source machine learning toolkit for Kubernetes.

About Arrikto

We are a Machine Learning platform powered by Kubeflow and built for Data Scientists. We make Kubeflow easy to adopt, deploy and use, having made significant contributions since the 0.4 release and continuing to contribute across multiple areas of the project and community. Our projects/products include:

Enterprise Kubeflow (EKF) is a complete MLOps platform that reduces costs, while accelerating the delivery of scalable models from laptop to production.
Kubeflow as a Service is the easiest way to get started with Kubeflow in minutes! It comes with a Free 7-day trial (no credit card required).
Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
Kale, an open source workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.

Securing your Kubeflow deployment with Kyverno policies

What is the Current State of Security in Kubeflow?

Examples of malicious usage (part 1)

Step 1: Create a Notebook

Step 2: Create a malicious Pod

Step 3: Gain access

Summary

Available Options

Pod Security Policy (PSP)

Pod Security Admission (PSA)

Kyverno

Gatekeeper/OPA

Summary

Why we chose Kyverno

Setting up Kyverno

Step1: Install

Step 2: Configure policies

1) Prevent users from deploying Pods that are sharing the host namespaces.

2) Prevent users from creating Pods with privileged containers.

Examples of malicious usage (part 2)

Conclusions

About the author

About Kubeflow

About Arrikto