We all know Kubernetes is awesome, and Kubeflow makes Kubernetes cool for machine learning (ML) teams. With Kubeflow, data scientists and ML engineers can share infrastructure and accelerate the delivery of ML models, while minimizing costs.
When multiple actors interact with shared infrastructure, security becomes a priority for IT. There is always āthat one guyā, who will try to circumvent security to show how clever they are, right?
Especially in a platform like Kubeflow, where users are allowed to create and delete resources as they wish, a system that allows them fine-grained permissions on what they can and cannot do is critical.
In this post, we will explore how we achieved that using Kyverno.
What is the Current State of Security in Kubeflow?
Currently, Kubeflow is restricting users’ access to sensitive resources by using RBAC authorization rules. With RBAC rules, each user is confined inside their dedicated namespace and, therefore, cannot access or modify sensitive system resources and data.
But, even having permission to manage resources in one namespace can be proven disastrous for the security of the whole cluster.
Examples of malicious usage (part 1)
Letās wear our black hats and get our hands dirty, shall we?
In the following example, we will demonstrate how a user can completely PWN a Kubernetes node in 3 simple steps.
What we will need:
- Access to a Kubeflow deployment with the `kubeflow-admin` role
- Thatās all
Step 1: Create a Notebook
Login into the Kubeflow dashboard and create a simple Notebook:
Step 2: Create a malicious Pod
Using the Notebook, we create a new YAML file:
We open a new terminal and apply this resource:
jovyan@pwner-0:~$ kubectl apply -f pwn.yaml
pod/pwn configured
Step 3: Gain access
Now we simply open a terminal in the Pod we just created:
jovyan@pwner-0:~$ kubectl exec -it pwn -- bash
[root@node /]#
And voila! Root access in the Node.
We can ārm -rf /*ā to punish that pesky IT admin we donāt like and ruin their weekend (please donāt ā¦).
Summary
The previous example is not optimal for the cluster’s security, as we can acquire full root access in the node in a matter of minutes without obstacles.
We can do better!
Available Options
So, we need to beef up the cluster’s security. Letās explore the available options to do that:
There are three major security solutions for Kubernetes:
- Pod Security Policy (PSP)
- Pod Security Admission (PSA)
- Kyverno
- Gatekeeper/OPA
Pod Security Policy (PSP)
Pod security policies are the first built-in Kubernetes method for fine-grained authorization of Pods creation. With PSPs, the admin can prevent the creation of Pods based on a set of conditions (e.g., donāt allow privileged containers. See the official documentation for all the possible rules: https://kubernetes.io/docs/concepts/security/pod-security-policy). We could use that to improve the security of our cluster. Still, PSPs come with some significant drawbacks:
- The policies are applied only on Pods and not on other types of resources, e.g. ingresses.
- The policies are applied to service accounts using RBAC rules and not directly to Pods. This restriction overcomplicates large deployments where many system service accounts come into play.
- Pod Security Policies are deprecated as of Kubernetes v1.21 and will be removed in v1.25. We donāt want to invest in a technology that has an expiration date.
Pod Security Admission (PSA)
Pod Security Admission (PSA) is the successor to PSPs. Similar to PSPs, the admin can restrict the creation of Pods with the difference that the rules are applied per namespace. Still, there are some drawbacks:
- Again, only for Pods validation.
- Not very mature.
- Not very configurable.
- Enabled by feature gate (or manually installing the admission webhook).
Kyverno
Kyverno is a policy engine designed specifically for Kubernetes from the ground up. It is an open-source project developed by Nirmata and donated to CNCF, currently in incubating state. Kyverno works by using a dynamic admission controller to:
- Validate resources
- Mutate resources
- Generate resources
by defining policies in YAMLs, as Kubernetes resources. This looks promising (you read the title, you know where this is going :-)) ā¦
Gatekeeper/OPA
OPA (Open Policy Agent) is a graduated CNCF project. It is a general-purpose policy engine (not Kubernetes specific), and you can use its high-level language to define policies. Gatekeeper is a dynamic admission controller that uses OPA as a policy engine for Kubernetes resources. Using native Kubernetes CRDs, you can define policies that:
- Validate resources.
- Mutate resources.
- Generate resources.
The high-level DSL of OPA makes it a very versatile tool but a little hard to configure.
Summary
The following table shows a summary of the pros and cons of each solution:
Pros | Cons | |
PSP/PSA | ā¢ Official Kubernetes security solution | ā¢ Only for Pods validation ā¢ Not very configurable |
Kyverno | ā¢ Versatile policy engine ā¢ Easy to configure ā¢ Designed for Kubernetes | ā¢ Relatively new ā¢ Not able to write very complex policies |
Gatekeeper/OPA | ā¢ Capable of very complex policies ā¢ Mature | ā¢ Hard to configure |
Why we chose Kyverno
We chose Kyverno for the policy engine in our cluster to strike a balance between versatility and ease of use. Due to the fluid nature of a Kubeflow deployment, we want every administrator to be able to understand and manage the security policies and customize them based on their needs.
Setting up Kyverno
Step1: Install
Kyverno provides 2 methods of installation:
- Helm
- YAMLs
We choose to go with the simple YAMLs installation. To install Kyverno, run:
user@localhost:~$ kubectl create -f
https://raw.githubusercontent.com/kyverno/kyverno/release-1.7/config/release/install.yaml
To validate that everything is up and run:
user@localhost:~$ kubectl get pods -n kyverno
NAME READY STATUS RESTARTS AGE
kyverno-5cc856f997-hxxbq 1/1 Running 0 28h
Step 2: Configure policies
Now to the fun part!
We will leverage the capabilities of Kyverno policies to restrict malicious users’ ability to take over our cluster.
Kyverno provides a library of sample policies that we can use and prevent common attack vectors (https://kyverno.io/policies).
In this example, we are going to configure 2 policies:
1) Prevent users from deploying Pods that are sharing the host namespaces.
The first policy that we will apply prevents the users from creating a Pod that shares the namespaces with the host.
More specifically a user cannot create a Pod that has the hostPID, hostNetwork or hostIPC set to true, e.g:
Create a new `disallow-host-namespaces.yaml` file with the following content:
And apply it:
user@localhost:~$ kubectl apply -f disallow-host-namespaces.yaml
clusterpolicy.kyverno.io/disallow-host-namespaces created
(source: https://kyverno.io/policies/pod-security/baseline/disallow-host-namespaces/disallow-host-namespaces/)
2) Prevent users from creating Pods with privileged containers.
Now we will create a policy that prevents users from creating Pods that contain privileged containers, e.g:
Create a new `disallow-privileged-containers.yaml` file with the following content:
And apply it:
user@localhost:~$ kubectl apply -f disallow-privileged-containers.yaml
clusterpolicy.kyverno.io/disallow-privileged-containers created
Finally verify that the policies are ready:
user@localhost:~$ kubectl get clusterpolicy
NAME BACKGROUND ACTION READY
disallow-host-namespaces true enforce true
disallow-privileged-containers true enforce true
Now we are ready!
Examples of malicious usage (part 2)
Letās revisit the previous example where the user gained access to our node by creating a privileged Pod.
Open your Notebook and try to create the same Pod as we created before:
jovyan@pwner-0:~$ kubectl apply -f pwn.yaml
Error from server: error when creating "pwn.yaml": admission webhook "validate.kyverno.svc-fail" denied the request:
resource Pod/kubeflow-user/pwn was blocked due to the following policies
disallow-host-namespaces:
host-namespaces: 'validation error: Sharing the host namespaces is disallowed. The
fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to
`false`. . Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privileged-containers:
privileged-containers: 'validation error: Privileged mode is disallowed. The fields
spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged
must be unset or set to `false`. . Rule privileged-containers failed
at path /spec/containers/0/securityContext/privileged/'
Something different happened this time! The privileged Pod was not allowed and `kubectl` returned a validation error explaining what was wrong with the manifest.
Conclusions
Kubeflow is a very versatile platform, but from a security perspective, improvements are needed. In this post, we saw an example of how we utilize Kyverno at Arrikto Enterprise Kubeflow to improve the security of our cluster in a multi-user environment.
Stay tuned for part 2, where we explore some caveats of Kyverno and the configuration we needed to make the integration as seamless as possible.