Introduction

Setting up a Kubernetes cluster in your organization is a significant achievement, but it also brings challenges in monitoring, auditing, and securing the cluster.

When it comes to securing the cluster, platform engineers or security engineers need to track what is being deployed, how it is deployed, and the level of vulnerability that the application may introduce to the platform. In the case of a large enterprise Kubernetes cluster, enforcing label policies for all applications being deployed becomes essential. This is where the Open Policy Agent Gatekeeper proves to be valuable.

OPA Terminologies

kubernetes illustration for: OPA Terminologies
  • Admission Controller Webhooks: This checks admission requests before they are persisted as objects in Kubernetes.
  • Open Policy Agent (OPA): Policy engine for cloud-native environments. It is a framework for applying policy decisions
  • Gatekeeper: An admission controller that checks requests to create or update resources on the Kubernetes cluster by enforcing policies executed by OPA.
  • Constraint: A declaration that its author wants a system to meet a given set of requirements. It is written in Rego. It is evaluated as a logical AND; hence, if one constraint is not satisfied, the whole request gets rejected.
  • ConstraintTemplate: Defines a way to validate a set of Kubernetes objects in Gatekeeper's Kubernetes admission controller. It includes:
  1. Rego Code, which defines policy violation
  2. Constraint object, which represents the instantiation of ConstraintTemplate

How OPA Works

Open Policy Agent (OPA) is a policy engine that enforces policies for cloud infrastructure using declarative language.

Key Features of OPA:

  • Policy Writing: OPA uses its own language, Rego, to write policies. Rego is designed to inspect and transform structured data, such as JSON and YAML.
  • Policy Decisions: When software needs to make policy decisions, it queries OPA and provides structured data as input. OPA evaluates the query input against policies and data to generate policy decisions.
  • Offloading Policy Decisions: OPA offloads policy decisions from services, allowing them to focus on other tasks. For example, OPA can answer questions like whether an API call should be allowed, how much quota a user has, or for which hosts a container can be deployed on.
  • Dynamic Policy Updates: OPA can download policy and data bundles from remote HTTP servers. These bundles are loaded on the fly without restarting OPA.
  • Rule Chaining: OPA allows rule chaining, which means that an output variable can be used to form other output variables.

Prerequisites

Setup OPA in Kubernetes Cluster

Setting up the DOKS cluster

To deploy a Kubernetes cluster on the cloud provider, refer to How to Setup a k8s cluster on the cloud provider or use the following doctl command directly from your command line interface:

				
					doctl kubernetes cluster create k8s-opa-cluster --region <region> --version 1.31.1-do.5 --node-pool "name=<name>;size=s-2vcpu-4gb;count=3"
				
			

Integrate OPA in Kubernetes Cluster

In cases where service mesh is not used, the easiest way to integrate OPA in the kubernetes cluster is via Gatekeeper. In order to do that you would require to install the necessary CRDs for the gatekeeper. This can be done via yaml or helm charts. For the sake of simplicity we will do it via yaml.

				
					kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.17.1/deploy/gatekeeper.yaml
				
			

But if we want to do it via helm, you can do so as below:

				
					helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
# Helm install with gatekeeper-system namespace already created
$ helm install -n gatekeeper-system [RELEASE_NAME] gatekeeper/gatekeeper

# Helm install and create namespace
$ helm install -n gatekeeper-system [RELEASE_NAME] gatekeeper/gatekeeper --create-namespace
				
			

You can refer to GitHub documentation on Installing Gatekeeper Helm Chart

This will result in following components deployment:

Use Case Example

In this example, we will deal with a very common problem platform engineers face in the Kubernetes cluster they manage. That is, a worker node's compute is overused when tenants do not define the range of CPU and Memory the workload will utilize.

You will achieve this by setting up a Constraint named: K8sRequiredResources

Without Constraints

Below is the sample deployment that we deploy on the kubernetes cluster, without defining any resource constraints

				
					apiVersion: apps/v1
kind: Deployment
metadata:
 name: unrestricted-deployment
spec:
 replicas: 1
 selector:
   matchLabels:
     app: unrestricted-deployment
 template:
   metadata:
     labels:
       app: unrestricted-deployment
   spec:
     containers:
     - name: deployment-a
       image: nginx:1.14.2
       ports:
       - containerPort: 80
				
			

Create ConstraintTemplate

				
					apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
 name: k8srequiredresources
 annotations:
   description: >-
     Requires workloads to specify resource limits and requests.
spec:
 crd:
   spec:
     names:
       kind: K8sRequiredResources
     validation:
       openAPIV3Schema:
         type: object
         properties:
           resourceRequestRequired:
             type: boolean
             description: If true, resource requests should be specified.
           resourceLimitRequired:
             type: boolean
             description: If true, resource limits should be specified.


 targets:
   - target: admission.k8s.gatekeeper.sh
     rego: |
       package k8srequiredresources


       violation[{"msg": msg}] {
         kind := input.review.kind.kind
         workload_name := input.review.object.metadata.name
         workload_namespace := input.review.object.metadata.namespace
         ctr := input.review.object.spec.template.spec.containers[_]
         input.parameters.resourceRequestRequired
         not ctr.resources.requests
         msg := sprintf("%v <%v> in <%v> namespace contains container <%v> with no resource requests specified", [kind, workload_name, workload_namespace, ctr.name])
       }


       violation[{"msg": msg}] {
         kind := input.review.kind.kind
         workload_name := input.review.object.metadata.name
         workload_namespace := input.review.object.metadata.namespace
         ctr := input.review.object.spec.template.spec.containers[_]
         input.parameters.resourceLimitRequired
         not ctr.resources.limits
         msg := sprintf("%v <%v> in <%v> namespace contains container <%v> with no resource limits specified", [kind, workload_name, workload_namespace, ctr.name])
       }
				
			

Once applied, you will be able to query the Kubernetes object named k8srequiredresources

Create Constraint

In Constraint, you can define the scope of the restrictions to be imposed. That means, to which kubernetes object, this Constraint will be applied and the namespace to be applied along with their respective exceptions. Below is an example of actual constraint you can define for your use case:

				
					apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResources
metadata:
 name: workload-must-have-resources-set
spec:
 match:
   kinds:
     - apiGroups: ["apps"]
       kinds: ["Deployment", "StatefulSet", "DaemonSet", "Pod"]
 parameters:
   resourceRequestRequired: true
   resourceLimitRequired: true

				
			

Now if we try to apply the similar deployment after deleting previous, error will be displayed along with violation count increase.

Post Setting up Constraints: The deployment yaml needs to have specifications of resources and limits in order to get deployed on the Kubernetes cluster now.

				
					apiVersion: apps/v1
kind: Deployment
metadata:
 name: unrestricted-deployment
spec:
 replicas: 1
 selector:
   matchLabels:
     app: unrestricted-deployment
 template:
   metadata:
     labels:
       app: unrestricted-deployment
   spec:
     containers:
     - name: deployment-a
       image: nginx:1.14.2
       ports:
       - containerPort: 80
       resources:
         requests:
           memory: "64Mi"
           cpu: "250m"
         limits:
           memory: "128Mi"
           cpu: "500m"

				
			

Cleanup

You can delete the k8s cluster using this command:

				
					doctl kubernetes cluster delete k8s-opa-cluster --dangerous

				
			

Conclusion

In this tutorial, you learned how to implement essential safeguards on your Kubernetes cluster to prevent misuse or vulnerabilities that can arise from workloads deployed by tenants. By following the steps outlined, you can ensure that your cluster is better protected against potential security threats and misuse of resources. This is particularly important in multi-tenant environments where multiple users or teams share the same cluster, as it helps maintain the integrity and reliability of the platform.