Back to Blog

Implementing Gatekeeper and OPA Policies in Kubernetes

Implementing Gatekeeper and OPA Policies in Kubernetes

In a distributed Kubernetes ecosystem, the ability to deploy rapidly is often at odds with the need to maintain strict security and operational standards. As clusters scale, the "wild west" approach-where any authenticated user can deploy any manifest-inevitably leads to configuration drift, security vulnerabilities, and resource exhaustion.

Standard Kubernetes Role-Based Access Control (RBAC) is excellent at answering "Who can do what," but it is fundamentally incapable of answering "Under what conditions." RBAC can allow a user to create a `Deployment`, but it cannot prevent that deployment from using a forbidden container registry or running with privileged escalation. To bridge this gap, we require a policy engine capable of inspecting the content of the API requests. This is where Open Policy Agent (OPA) and its Kubernetes-native implementation, Gatekeeper, become indispensable.

The Architecture of Policy Enforcement

To understand how to implement these tools, we must first distinguish between the engine and the controller.

Open Policy Agent (OPA)

OPA is a general-purpose, open-source policy engine. It uses a declarative language called Rego, which is inspired by Datalog. OPA is decoupled from the application logic; it receives a JSON input (the state of the world), evaluates it against a set of Rego policies, and returns a decision (allow/deny).

Gatekeeper

While OPA can be used for Kubernetes, it is not "Kubernetes-aware" out of the box. Gatekeeper acts as the Kubernetes-specific implementation of OPA. It functions as a Validating Admission Webhook. When a request hits the Kubernetes API server (e.g., `kubectl apply`), the API server sends an `AdmissionReview` object to the Gatekeeper webhook. Gatekeeper then evaluates this object against its loaded policies and instructs the API server to either admit or reject the request.

The Dual-Resource Model: ConstraintTemplates and Constraints

The most critical technical concept in Gatekeeper is the separation of policy logic from policy configuration. Gatekeeper utilizes two distinct Custom Resource Definitions (CRDs):

  1. `ConstraintTemplate`: This defines the logic. It contains the Rego code and defines the parameters that the policy will accept. It is essentially the "class" in object-oriented terms.
  2. `Constraint`: This defines the application of that logic. It specifies which resources to target and provides the specific values (parameters) for the template. It is the "instance" of the template.

This separation allows platform engineers to write complex Rego logic once and reuse it across different namespaces or clusters with varying parameters.

Practical Implementation: Enforcing Container Registry Compliance

One of the most common use cases is ensuring that all images deployed to the cluster originate from a trusted, internal registry. This prevents developers from accidentally (or maliciously) pulling images from public, unvetted sources.

Step 1: Defining the `ConstraintTemplate`

First, we define the Rego logic. We need to iterate through the containers in a Pod spec and check if the image string starts with our trusted prefix.

```yaml

apiVersion: templates.gatekeeper.sh/v1

kind: ConstraintTemplate

metadata:

name: k8sregistryallowed

spec:

crd:

spec:

names:

kind: K8sRegistryAllowed

validation:

openAPIV3Schema:

type: object

properties:

allowedRegistries:

type: array

items:

type: string

target:

rego: |

package k8sregistryallowed

violation[{"msg": msg}] {

container := input.review.object.spec.containers[_]

image := container.image

not is_allowed(image)

msg := sprintf("image '%v' is not from a trusted registry", [image])

}

is_all_allowed(images)

is_allowed(image) {

some i

image = images[i]

}

is_allowed(image) {

Logic to check prefix or regex

For simplicity, we check if the image starts with an allowed registry

some i

allowed_registry := input.parameters.allowedRegistries[i]

startswith(image, allowed_registry)

}

```

Step 2: Applying the `Constraint`

Now, we instantiate the policy. We don't need to touch the Rego code again; we simply define which registries are permitted.

```yaml

apiVersion: constraints.gatekeeper.sh/v1

kind: K8sRegistryAllowed

metadata:

name: enforce-trusted-registry

spec:

match:

kinds:

  • apiGroups: ["apps"]

kinds: ["Deployment"]

  • apiGroups: [""]

kinds: ["Pod"]

parameters:

allowedRegistries:

  • "my-company-registry.io/"
  • "internal-docker.local/"

```

In this setup, if a developer attempts to deploy a pod using `image: nginx:latest`, the API server will receive a `deny` response from Gatekeeper, and the deployment will fail with a descriptive error message.

Operational Considerations: The "Audit" vs. "Deny" Strategy

Implementing Gatekeeper is a high-stakes operation. A poorly written Rego policy can inadvertently block critical system components (like `kube-proxy` or CNI plugins) from restarting, effectively breaking the cluster.

The Audit Workflow

Never deploy a new `Constraint` in `deny` mode immediately. The recommended lifecycle is:

  1. Audit Mode: Deploy the `ConstraintTemplate` and the `Constraint`, but ensure your logic is designed to only flag violations without blocking. Gatekeeper's audit functionality scans existing resources in the cluster and reports non-compliant objects in the `status` field of the `Constraint`.
  2. Observability: Monitor the Gatekeeper logs and Prometheus metrics (e.g., `gatekeeper_

Conclusion

As shown across "The Architecture of Policy Enforcement", "The Dual-Resource Model: ConstraintTemplates and Constraints", "Practical Implementation: Enforcing Container Registry Compliance", a secure implementation for implementing gatekeeper and opa policies in kubernetes depends on execution discipline as much as design.

The practical hardening path is to enforce deterministic identity policy evaluation with deny-by-default semantics, admission-policy enforcement plus workload isolation and network policy controls, and behavior-chain detection across process, memory, identity, and network telemetry. This combination reduces both exploitability and attacker dwell time by forcing failures across multiple independent control layers.

Operational confidence should be measured, not assumed: track false-allow rate and time-to-revoke privileged access and mean time to detect and remediate configuration drift, then use those results to tune preventive policy, detection fidelity, and response runbooks on a fixed review cadence.

Related Articles

Explore related cybersecurity topics:

Recommended Next Steps

If this topic is relevant to your organisation, use one of these paths: