Enforcing policies and governance for Kubernetes workloads using GateKeeper

Policies in Kubernetes allow you to prevent specific workloads from being deployed in the cluster.

While compliance is usually the reason for enforcing strict policies in the cluster, there are several recommended best practices that cluster admins should implement.

Examples of such guidelines are:

Not running privileged pods.
Not running pods as the root user.
Not specifying resource limits.
Not using the latest tag for the container images.

Besides, you may want to enforce bespoke policies that all workloads may wish to abide by, such as:

All workloads must have a "project" and "app" label.
All workloads must use container images from a specific container registry (e.g. example.com).

Ther are two In this ways of enforcing policies for your Kubernetes workloads :-

The out-of-cluster approaches are accomplished by running static checks on the YAML manifests before they are submitted to the cluster. Eg : conftest
The in-cluster approaches make use of admission controllers which are invoked as part of the API request and before the manifest is stored in the database.

However in this article, we will learn about enforcing policies for our Kubernetes workloads usingin-cluster solutions like GateKeeper.

Setup

kubectl

I am using simple kind cluster for this. If you want you can use any other distribution as well whichever you are comfortable in. You can install kind from here.

To quickly create a cluster, run the below command :-

kind create cluster

The Kubernetes API

Let's understand what happens when you create a Pod like this in the cluster:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: nginx:1.9
    ports:
    - containerPort: 8080

Well at a high-level :-

The YAML definition is sent to the API server.
The YAML definition is stored in etcd.
The scheduler assigns the Pod to a node.
The kubelet retrieves the Pod spec and creates it.

But what's going on under the hood — is it that simple?

When the kube-apiserver receives the request, it doesn't store it in etcd immediately.

The first is the HTTP handler which receives and processes the HTTP requests.
Next the API verifies the caller. Are you a user of the cluster?
Your user account is checked against the RBAC rules to verify if you are authorized to access the resources.
Before the object reaches the database, it is intercepted by Admission Controllers.

Admission Controllers

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the resource, but after the request is authenticated and authorized.

There are mainly two types of Admission Controllers

Validating Admission Controller : These controllers do not modify the requests. Instead, they evaluate whether the incoming requests comply with predefined policies and rules. If a request does not meet the criteria, it is rejected, preventing the resource from being created or updated.

For eg, let's take NameSpaceLifeCycle, a validating admission controller whose task is :
1. Prevents Object Creation in Terminating Namespaces
2. Rejects Requests for Non-Existent Namespaces
3. stops requests that could delete the default, kube-system and kube-public namespaces.
Mutating Admission Controller : These controllers can modify or "mutate" the incoming API requests before they are persisted in the cluster. They can add, change, or remove fields in the resource being created or updated.

For eg, there is DefaultStorageClass, a mutating admission controller whose task is :

Assigning Default Storage Class: The DefaultStorageClass controller checks for the presence of a storage class marked with the annotation storageclass.kubernetes.io/is-default-class: "true".
Enforcing Single Default: It ensures that only one storage class can be marked as the default at any given time.
Dynamic Provisioning: When a PVC is created without a specific storage class, the controller automatically provisions a volume using the designated default storage class.

Kubernetes has several mutating and validating admission controllers.

You can find the full list on the official documentation.

If you pay close attention, you will learn that the Mutating Admission Controllers are the first controllers to be invoked.
When the resource is mutated, is then passed to the schema validation phase. Here the API checks that resource is still valid {Is any field missing?}
In the last step you can find the Validating Admission Controllers. Those are the last component before the resource is stored in etcd.

But what if you want to have a custom check or mutate the resources according to your rules?

You can register a component to the Mutation or Validation webhook, and those controllers will call it when the request passes through the Admission phase.

And that's precisely what Gatekeeper does — it registers as a component in the cluster and validates requests.

Enforcing policies using Gatekeeper

Default Admission Controllers: These are pre-configured controllers that enforce rules and policies without requiring additional setup. They are part of the core Kubernetes functionality.
Custom Admission Controllers via Webhooks: Webhooks provide a way to implement custom logic for admission control, allowing organizations to tailor Kubernetes behavior to their specific needs.

NOTE : Webhooks are HTTP callbacks that can be configured to act as dynamic admission controllers.

Gatekeeper allows a Kubernetes administrator to implement policies for ensuring compliance and best practices in their cluster. Gatekeeper embraces Kubernetes native concepts such as Custom Resource Definitions (CRDs) and hence the policies are managed as Kubernetes resources.

Why use Gatekeeper rather than OPA ??

There are couple of reasons I have choosen Gatekeeper over OPA.

OPA is a general-purpose policy engine that can be used across various domains, including APIs, microservices, and cloud infrastructure. Whereas Gatekeeper is specifically designed for Kubernetes.
OPA can be integrated into Kubernetes but requires additional setup and configuration, such as using sidecars or other methods to connect with the API server. While Gatekeeer provides a native integration with Kubernetes through CRDs for constraints and constraint templates. This makes it easier to implement and manage policies.

Once you have kubectl configured to communicate to the cluster, run the following to set up gatekeeper:

kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml

 kubectl apply -f gatekeeper.yaml

The service/gatekeeper-webhook-service, that is invoked by the Kubernetes API as part of the request processing in the "Validating Admission" stage.

All your Pods, Deployments, Services, etc. are now intercepted and scrutinised by Gatekeeper.

Defining reusable policies using a ConstraintTemplate

In Gatekeeper, you need first to create a policy using a ConstraintTemplate custom resource.

Let's have a look at an example. The following ConstraintTemplate definition rejects any deployment that uses the latest tag:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8simagetagvalid
spec:
  crd:
    spec:
      names:
        kind: K8sImageTagValid
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8simagetagvalid

        violation[{"msg": msg, "details":{}}] {
          image := input.review.object.spec.template.spec.containers[_].image
          not count(split(image, ":")) == 2
          msg := sprintf("image '%v' doesn't specify a valid tag", [image])
        }

        violation[{"msg": msg, "details":{}}] {
          image := input.review.object.spec.template.spec.containers[_].image
          endswith(image, "latest")
          msg := sprintf("image '%v' uses latest tag", [image])
        }

Understanding ConstraintTemplate

This template enforces two rules for container images in Kubernetes workloads:

All images must include a valid tag (images without a tag or with malformed tags are not allowed).
Using the "latest" tag is prohibited because it is considered a bad practice (as it can lead to unpredictable deployments).

The target is admission.k8s.gatekeeper.sh, which means this policy is evaluated during the admission control phase.
package k8simagetagvalid defines a rego package which encapsulates the policy logic.
Violation Rule 1: Valid Image Tags
- The image string is split at the colon ":".
- If the resulting array does not have exactly 2 parts (e.g., repo/image:tag), it triggers a violation.
- An error message is generated: "image '<image>' doesn't specify a valid tag".
Violation Rule 2: Prohibit Latest Tag
- If the image string ends with latest, it triggers a violation.
- An error message is generated: "image '<image>' uses latest tag"

NOTE : The above policy is written in Rego language.

$ kubectl apply -f templates/check_image_tag.yaml

A ConstraintTemplate isn't something you can use to validate deployments, though.

It's just a definition of a policy that can only be enforced by creating a Constraint.

Creating a constraint

A Constraint is a way to say "I want to apply this policy to the cluster".

You can think about ConstraintTemplates as a book of recipes. You have hundreds of recipes for baking cakes and cookies, but you can't eat them. You need to choose the recipe and mix the ingredients to bake your cake.

Constraints are a particular instance of a recipe — the ConstraintTemplate. The following Constraint uses the previously defined ConstraintTemplate (recipe) K8sImageTagValid:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sImageTagValid
metadata:
  name: valid-image-tag
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment"]

Understanding Constraint

The file creates a Constraint named valid-image-tag of type K8sImageTagValid
The match section specifies the Kubernetes resources to which this Constraint applies.

Notice how the Constraint references the ConstraintTemplate

$ kubectl apply -f check_image_tag_constraint.yaml

Testing the policy

Now, let's test the Deployment with the nginx container image:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 8080

The deployment is rejected by the Gatekeeper policy. Once you add a valid tag, the deployment should get created.

There is one more policy “labels_check” which you can try to implement. You can checkout the code my github repo.

References

https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/

https://learnk8s.io/kubernetes-policies

https://github.com/open-policy-agent/gatekeeper-library