Valtik Studios
Back to blog
KubernetescriticalUpdated 2026-04-1735 min

Kubernetes Security: The Complete Hardening Guide for 2026

Every Kubernetes audit starts the same way. We land read-only kubeconfig access and within 45 seconds we have identified at least one ServiceAccount that can escalate to cluster-admin if the pod it runs in gets compromised. This is the complete 2026 Kubernetes hardening guide we walk through on every client engagement. Ten layers. API server, auth, RBAC, pod security, network policies, secrets, image security, supply chain, runtime detection, upgrade cadence.

TT
Tre Trebucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Pentester. Based in Connecticut, serving US mid-market.

Every Kubernetes audit starts the same way

We land read-only kubeconfig access on a client's production cluster. We run \kubectl get clusterrolebindings -o wide\. Within 45 seconds we've identified at least one ServiceAccount that can trivially escalate to cluster-admin if the pod it's attached to gets compromised.

This is not a story about bad Kubernetes administrators. Kubernetes admins are good at Kubernetes. The problem is that Kubernetes was designed by Google to run Google-scale workloads at Google's operational sophistication. Every default, every example in the docs, every tutorial on YouTube assumes you have an in-house infrastructure team that knows what to tune, what to disable, and what to tighten after the defaults.

Most companies running Kubernetes do not have that team. They copy the deployment manifest from the vendor. They follow the Helm install. They put the cluster on a public IP because the alternative is "learn VPC peering." They end up with production workloads running on a cluster that's one compromised container away from full takeover.

This is the complete 2026 Kubernetes hardening guide we walk through on every client engagement. It's long. Kubernetes security is layered and no single layer is complete. You need them all.

Who this is for

  • Platform engineers running production Kubernetes on EKS, GKE, AKS, or self-managed.
  • Security engineers getting asked "is our Kubernetes secure" and needing a real answer.
  • CTOs deciding whether their engineering team is operating Kubernetes safely.
  • Consultants preparing for a Kubernetes-heavy client engagement.

Not for: pure "hello world" clusters. If you're running minikube on your laptop, most of this doesn't apply.

The threat model

Before any hardening, be clear on what you're defending against.

The attacker profile

  • External attacker who finds an internet-facing service. Typically SSRF via an application bug or credential theft via phishing.
  • Supply chain attacker who compromises a container image you pull. Possibly via a malicious npm package, possibly via a compromised base image.
  • Insider with legitimate kubectl access who goes rogue or whose credentials get phished.
  • Lateral attacker who lands in one pod (via any of the above) and tries to move laterally to the cluster control plane or other pods.

The attacker goal

Almost always the same: extract secrets, pivot to cloud credentials, exfiltrate data, or establish persistence. The specific technique varies (cluster-admin escalation, node takeover, etcd read, secret extraction) but the destination is usually "cloud-provider IAM credentials" or "customer data."

The control surface

Your job as a defender is to make each step in the attack chain either impossible or noisy. Perfect prevention isn't achievable. You want detection + constrained attacker capability at every layer.

Layer 1: The Kubernetes API server

The API server is the control plane. Everything in Kubernetes goes through it. If an attacker can talk to the API server as cluster-admin, the cluster is gone.

1.1 Disable unauthenticated access

Sounds obvious. Still the #1 cluster takeover we see. Check:

\\\`

kubectl auth can-i --list --as=system:anonymous

\\\`

If that returns anything other than a short default list, you have exposed unauthenticated access. Fix via RBAC — remove any \ClusterRoleBinding\ granting roles to \system:unauthenticated\ or \system:anonymous\.

1.2 Lock down the API server network

On managed Kubernetes (EKS, GKE, AKS), the API server endpoint can be public or private. Default is often public. Fix:

  • EKS. Enable private endpoint, disable public endpoint, connect via VPN or bastion.
  • GKE. Use private cluster. Control plane VPC peering.
  • AKS. Private cluster, authorized IP ranges as fallback.

If you must have public API access (CI/CD from outside your VPC, for example), restrict to authorized IP ranges. Never \0.0.0.0/0\.

1.3 Audit logging

Kubernetes audit logs are off by default on some distributions. Turn them on.

Audit policy should log:

  • All requests at metadata level minimum
  • \RequestResponse\ level for access to Secrets and ConfigMaps
  • \RequestResponse\ level for Create/Update/Delete on critical resources

Ship logs to a SIEM. Retention 1 year minimum. Alert on:

  • Failed authentication from unexpected IPs
  • \exec\ or \attach\ into any pod
  • Any access to \kube-system\ namespace by non-admin accounts
  • Changes to \ClusterRoleBinding\ or \RoleBinding\

Layer 2: Authentication

How users and services prove who they are.

2.1 Kill static tokens

Legacy Kubernetes used long-lived ServiceAccount tokens stored in Secrets. These don't expire and can be extracted from any compromised pod. As of Kubernetes 1.24+, the default is bound projected tokens that are short-lived and tied to specific pods.

Check:

\\\`

kubectl get secrets -A -o json | jq '.items[] | select(.type == "kubernetes.io/service-account-token")' | head

\\\`

If you find long-lived tokens (no expiration), migrate off them. They're a credential theft risk.

2.2 External identity provider for human access

Never let humans use static kubeconfigs. Set up OIDC against your IdP (Okta, Entra ID, Google Workspace). Every kubectl command authenticates as a human with SSO + MFA.

On managed clusters:

  • EKS. AWS IAM Authenticator maps IAM principals to Kubernetes users. MFA enforced via IAM.
  • GKE. Google accounts natively.
  • AKS. Entra ID integration.

2.3 Workload identity

Pods that need to talk to cloud APIs should use workload identity, not static cloud credentials mounted as Secrets.

  • EKS. IAM Roles for Service Accounts (IRSA). Pod's ServiceAccount mapped to an IAM role.
  • GKE. Workload Identity. Same concept.
  • AKS. Azure AD Workload Identity.

Eliminates the "cloud credentials sitting in Secrets" anti-pattern we still find in 60% of audits.

Layer 3: Authorization (RBAC)

Who can do what. The layer where most mistakes live.

3.1 No wildcard permissions

The single biggest RBAC red flag. Any Role or ClusterRole that contains:

\\\`

  • apiGroups: ["*"]
resources: ["*"]

verbs: ["*"]

\\\`

This is cluster-admin. Anyone bound to this role can do anything. Audit your RoleBindings for these. They're almost never necessary.

3.2 Dangerous verbs

Verbs that grant privilege escalation even without wildcard:

  • create + pods in a namespace where privileged ServiceAccounts run: the attacker can create a pod mounting \hostPath: /\ and read secrets off the node.
  • create + clusterrolebindings: attacker binds themselves to cluster-admin.
  • escalate + clusterroles: attacker modifies an existing ClusterRole to add cluster-admin verbs.
  • impersonate + users/groups/serviceaccounts: attacker impersonates a higher-privileged identity.
  • bind + clusterroles/roles: attacker binds roles they couldn't grant themselves.
  • approve + certificatesigningrequests: attacker issues new client certs.
  • get + secrets in \kube-system\: secrets include service account tokens. Attacker pivots via stolen tokens.

Every audit, we run through each of these. Grep RoleBindings and ClusterRoleBindings for the combinations. Flag each one. Validate whether it's necessary.

3.3 Namespace boundaries

RBAC rarely prevents cluster-admin escalation when the attacker has admin in one namespace. Assume that namespace admin ≈ cluster admin for the purposes of threat modeling. If you need hard boundaries, use separate clusters.

3.4 ServiceAccount review

Every ServiceAccount that has permissions beyond read-only gets audited quarterly. Typical check:

\\\`

kubectl auth can-i --list --as=system:serviceaccount::

\\\`

Anything surprising in the output is worth investigating.

Layer 4: Pod security

The pod is the attack unit. What you let pods do constrains the blast radius of any compromised workload.

4.1 Pod Security Standards

Kubernetes ships three Pod Security Standards profiles built into the API server:

  • Privileged. No restrictions. Default in many distributions. Bad.
  • Baseline. Minimal restrictions. Prevents known privilege escalations. Decent floor.
  • Restricted. Heavily restricted. Prevents most classes of container escape.

Enforce at the namespace level via labels:

\\\`yaml

apiVersion: v1

kind: Namespace

metadata:

name: production-apps

labels:

pod-security.kubernetes.io/enforce: restricted

pod-security.kubernetes.io/audit: restricted

pod-security.kubernetes.io/warn: restricted

\\\`

Every workload namespace should be at least Baseline, ideally Restricted. The \kube-system\ namespace and your infrastructure namespaces (istio, cert-manager) may need Privileged for system workloads. Everything else should not.

4.2 Admission controllers for policy

Pod Security Standards cover the basics. For fine-grained policy, deploy either:

  • Gatekeeper (OPA). Heavyweight, feature-rich. Good if you need complex policy.
  • Kyverno. Lightweight, Kubernetes-native. Easier to operate.

Policies every cluster should enforce:

  • No containers with \privileged: true\
  • No \hostPath\ mounts (except for explicit allowlist of system pods)
  • No \hostNetwork\ or \hostPID\ or \hostIPC\
  • No images tagged \:latest\ (forces immutable tags)
  • All containers must set \runAsNonRoot: true\
  • All containers must have \readOnlyRootFilesystem: true\
  • All containers must drop \ALL\ capabilities and only add the ones they need
  • All containers must set resource limits (memory, CPU)
  • No \automountServiceAccountToken: true\ unless explicitly needed
  • Images must be pulled from an approved registry
  • Images must have a verified signature (via Cosign / Sigstore)

4.3 Distroless images

Minimize the attack surface inside containers. Use distroless base images (\gcr.io/distroless/*\) or Chainguard Images. Fewer binaries means fewer things an attacker can use after compromising a container.

A distroless image has no shell, no package manager, no curl, no wget. When the attacker lands in a container with \nsenter\, their toolkit is empty.

4.4 Read-only root filesystem

Every container should mount its root filesystem as read-only. Write paths should be explicit \emptyDir\ volumes. This breaks most persistence and payload-download attack steps.

Layer 5: Network policies

Pods should not be able to talk to each other unless explicitly allowed.

5.1 Default deny

Every namespace starts with a default-deny NetworkPolicy:

\\\`yaml

apiVersion: networking.k8s.io/v1

kind: NetworkPolicy

metadata:

name: default-deny-all

spec:

podSelector: {}

policyTypes:

- Ingress

- Egress

\\\`

Then explicit allows for legitimate traffic.

5.2 CNI that actually enforces

Not every CNI implements NetworkPolicy. Flannel does not. Calico does. Cilium does. If you're running Flannel, you have no network policy enforcement. Switch to Calico or Cilium.

5.3 Egress control

This is the missed layer. Pods shouldn't be able to \curl\ arbitrary internet destinations. If they can, any compromised pod can exfiltrate data or pull down second-stage payloads.

Set egress policies to allow only the specific services each pod legitimately needs. DNS to CoreDNS. HTTPS to specific external APIs. That's it.

Cilium's identity-aware egress is particularly good here. You can write policies like "these pods can reach AWS S3 but nothing else."

Layer 6: Secrets management

6.1 External secrets, not native

Kubernetes Secrets are base64-encoded, not encrypted. Anyone with \get secrets\ permission can read them. If you're storing production credentials in native Kubernetes Secrets, you're one compromised pod away from credential theft.

Use an external secrets manager:

  • AWS Secrets Manager + External Secrets Operator
  • HashiCorp Vault + Vault Agent Injector
  • GCP Secret Manager + External Secrets Operator
  • Azure Key Vault + External Secrets Operator

The pattern: secret lives in the external system. External Secrets Operator syncs it into a Kubernetes Secret. Pod mounts the Secret. The external system is the source of truth, rotations happen there, audit logs capture every access.

6.2 Envelope encryption at rest

Kubernetes supports envelope encryption for Secrets stored in etcd. Enable it. Use your cloud KMS for the key encryption key.

\\\`yaml

apiVersion: apiserver.config.k8s.io/v1

kind: EncryptionConfiguration

resources:

- resources:

- secrets

providers:

- kms:

name: aws-kms

endpoint: unix:///var/run/kmsplugin/socket.sock

cachesize: 1000

timeout: 3s

- identity: {}

\\\`

6.3 Rotate ServiceAccount tokens

Default ServiceAccount tokens are long-lived until Kubernetes 1.24+. On older clusters, rotate them regularly. On newer clusters, ensure \BoundServiceAccountTokenVolume\ is used.

Layer 7: Image security

7.1 Image scanning

Every image scanned before it hits production. Tools:

  • Trivy. Open source, excellent, fast. Baseline.
  • Snyk Container. Commercial, integrates well with devs.
  • Grype. Open source, good.
  • Docker Scout. Built into Docker Hub.
  • Wiz / Lacework / Prisma. Commercial, agent-based, continuous.

Scan in CI. Block builds that have critical unpatched CVEs. Rescan every image in production continuously (post-scan-at-build vulnerabilities still happen).

7.2 Image signing

Sign every image you build. Verify signatures on deploy. Cosign + Sigstore is the standard.

  • Build pipeline signs image with Cosign.
  • Cluster admission controller (via Kyverno policy or OPA policy) verifies signature before allowing pod creation.
  • Unsigned images fail admission.

Breaks an entire class of supply chain attacks.

7.3 Private registry

Don't pull images from Docker Hub directly. Mirror images into your own registry (ECR, GCR, Artifact Registry, Harbor). Pull from there. Reasons:

  • Docker Hub rate limits, breaks production.
  • Docker Hub hosts compromised images.
  • You have no visibility into what's being pulled if developers pick arbitrary images.
  • If a public image gets yanked or poisoned, your internal mirror is still safe.

Layer 8: Supply chain

8.1 SBOM

Generate a Software Bill of Materials for every image. Use Syft or equivalent. Store SBOMs alongside the image.

When the next Log4j hits, you need to answer "which of our production images contain the vulnerable library" in minutes, not days. SBOM makes this trivial.

8.2 Admission-time policy

Signature verification + image source verification at admission time. Policy example:

  • Image must be from our private registry.
  • Image must be signed by our build pipeline's key.
  • Image must have a recent SBOM.
  • Image must not be \:latest\.

8.3 CI/CD pipeline hardening

Your pipeline builds production images. It's part of the blast radius.

  • Pipeline runs on ephemeral runners, not static VMs.
  • Pipeline credentials are scoped to the least permissions needed.
  • Pipeline logs are immutable.
  • Pipeline modifications require PR review + approvals.
  • Secrets used by the pipeline are short-lived tokens, not long-lived credentials.

Layer 9: Runtime detection

Prevention isn't enough. Detection covers what prevention misses.

9.1 eBPF-based runtime security

  • Falco. Open source. Rules-based detection of suspicious behavior in containers. Alerts on things like "shell spawned in a container that shouldn't have one" or "process read /etc/shadow."
  • Cilium Tetragon. eBPF-based. Deep kernel visibility.
  • Wiz Runtime / Lacework / Sysdig Secure. Commercial.

At minimum, deploy Falco with default rules. Alert on:

  • Shell in container
  • Unexpected process execution
  • File writes in system directories
  • Outbound connections to suspicious IPs (ThreatIntel integration)
  • Sensitive file access (\/etc/shadow\, SSH keys)

9.2 Cloud-provider runtime

  • AWS GuardDuty EKS Protection. Detects pod compromise + runtime threats via eBPF.
  • GKE Security Posture + Intrinsic Runtime.
  • Azure Defender for Containers.

Enable these. They're cheap relative to the coverage.

Layer 10: Upgrade cadence

Kubernetes ships a new minor version every 4 months. Each version is supported for 14 months.

Staying current is a security control. The CVEs fixed in newer versions are real. The older the version, the more unpatched CVEs in the API server, kubelet, and container runtime.

Cadence that works:

  • Track one minor version behind current. Test in staging, roll to production 1-2 months after release.
  • Security patches (within a minor version): apply within 2 weeks.
  • End of support: upgrade before support ends, not after.

Managed Kubernetes makes this way easier. Self-managed clusters, budget serious time for upgrade cycles.

The audit procedure we run

For a standard Kubernetes engagement, our procedure:

  1. Cluster inventory. Version, managed/self-managed, node count, namespace list, workload count.
  2. API server audit. Network exposure, authentication modes, authorization modes, audit logging state.
  3. RBAC dump. Every ClusterRoleBinding and RoleBinding analyzed. Dangerous verbs flagged. Service accounts with broad permissions flagged.
  4. Pod security. Every namespace's PSS profile checked. Privileged containers enumerated. HostPath mounts enumerated.
  5. Admission controllers. What's installed, what policies exist, what the gaps are.
  6. Network policies. Per-namespace analysis. Default deny in place?
  7. Secrets. External secrets integration check. Encryption at rest check.
  8. Image security. Registry policy, signing, scanning, SBOM.
  9. Runtime detection. Falco / equivalent deployed. Alerts flowing to SIEM.
  10. Upgrade posture. Current vs latest. CVE exposure.

The full audit takes 1-3 weeks depending on cluster complexity. Output is a prioritized remediation plan.

The 10 fastest wins

If you're looking at this whole guide and feeling overwhelmed, the 10 highest-leverage controls to implement first:

  1. Pod Security Standards set to \restricted\ on every workload namespace.
  2. Default-deny NetworkPolicy in every namespace.
  3. Workload identity replacing static cloud credentials.
  4. External secrets manager replacing native Secrets.
  5. Falco or equivalent runtime detection deployed.
  6. Image signing + verification at admission.
  7. Private endpoint on API server, no public access.
  8. Audit logs to SIEM with alerting on dangerous verbs.
  9. Regular RBAC review + removal of unused permissions.
  10. Upgrade to a supported Kubernetes version.

Any one of these is worth a week of work. All ten together is a transformation.

Closing

Kubernetes security is layered. Every layer catches the attacks the other layers miss. Miss one layer and an attacker who gets past the prior ones finds nothing stopping them.

The hardening guide above is long because Kubernetes itself is big. Getting this right is the difference between a cluster that contains an incident and a cluster that becomes the incident.

We run Kubernetes security engagements that walk through each layer, identify the gaps, and produce a prioritized remediation plan. If your cluster is in production and you're not sure how it scores on the 10 fastest wins list, an engagement pays for itself.

Valtik Studios, valtikstudios.com.

kuberneteskubernetes securityk8s hardeningrbacpod securitynetwork policiesadmission controllersruntime securitycloud native securitycomplete guide

Want us to check your Kubernetes setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Get new research in your inbox
No spam. No newsletter filler. Only new posts as they publish.