Kubernetes: The Complete Knowledge Map

Core Concepts

The fundamental building blocks of Kubernetes — containers, clusters, pods, and how they connect.

Container

Simple Definition: A container is a lightweight, standalone package that includes your code and everything it needs to run — runtime, libraries, settings. It runs the same way everywhere.

Deep Dive:

Containers use two Linux kernel features to work:

Namespaces — Give each container its own isolated view of the system (process tree, network, filesystem, users). A container can't see other containers or the host processes.
Cgroups (Control Groups) — Limit how much CPU, memory, disk I/O, and network a container can use. Prevents one container from starving others.

Unlike Virtual Machines, containers share the host OS kernel. This makes them:

Start in milliseconds (VMs take minutes)
Use MBs of memory (VMs use GBs)
Near-native performance (no hypervisor overhead)

A container image is built in layers. Each instruction in a Dockerfile (FROM, COPY, RUN) creates a layer. Layers are cached and shared — if 10 services use the same base image, that layer is stored only once.

Interview Answer: "A container is an isolated process that packages an application with all its dependencies using Linux namespaces for isolation and cgroups for resource limits. Unlike VMs, containers share the host kernel, making them lightweight, portable, and fast to start. They're the fundamental deployment unit in cloud-native architectures."

Real-World Usage: Netflix runs thousands of containers per service. Each microservice is packaged as a container image, stored in a registry, and deployed to Kubernetes. When a new version is ready, a new image is built, and containers are replaced — never patched in place. This immutability eliminates configuration drift.

Docker

Simple Definition: Docker is the most popular tool for building and running containers. You write a Dockerfile, build an image from it, and run containers from that image.

Deep Dive:

Docker consists of several components:

Docker CLI — Command-line interface (docker build, docker run, docker push)
Docker Daemon (dockerd) — Background service that manages images and containers
containerd — The actual container runtime that Docker uses internally
Dockerfile — A text file with step-by-step instructions to build an image

# Example Dockerfile
FROM node:20-alpine          # Base image
WORKDIR /app                 # Set working directory
COPY package*.json ./        # Copy dependency files
RUN npm ci --production      # Install dependencies
COPY . .                     # Copy application code
EXPOSE 3000                  # Document the port
CMD ["node", "server.js"]    # Start command

Important distinction: Kubernetes dropped Docker as its container runtime in v1.24. But this does NOT mean Docker is dead. Docker is still the standard tool for building images. K8s just uses containerd directly to run them (cutting out the Docker daemon middleman). Your Docker-built images work perfectly on K8s.

Interview Answer: "Docker popularized containers by making them easy to build and use. The core workflow is: write a Dockerfile, build an image, push to a registry, run as a container. While Kubernetes no longer uses Docker as its runtime (it uses containerd directly), Docker remains the primary tool for building container images. The images are OCI-compliant and work everywhere."

Real-World Usage: Every CI pipeline at companies like Shopify, GitHub, and Uber uses Docker to build container images. Developers write Dockerfiles, CI builds and scans the image, pushes it to ECR/GCR, and Kubernetes runs it using containerd.

Kubernetes (K8s)

Simple Definition: Kubernetes is an open-source platform that automates deploying, scaling, and managing containerized applications across a cluster of machines.

Deep Dive:

Kubernetes was designed by Google based on 15 years of running production workloads on their internal system called Borg. Open-sourced in 2014, now maintained by the CNCF.

What Kubernetes actually does:

Scheduling — Decides which machine runs which container based on resource needs, constraints, and policies
Self-healing — Restarts crashed containers, replaces unresponsive Pods, kills containers failing health checks
Scaling — Horizontally (more replicas) or vertically (more CPU/memory), automatically or manually
Service discovery & load balancing — Gives Pods DNS names, distributes traffic
Rolling updates & rollbacks — Deploy new versions with zero downtime, roll back if something breaks
Secret & config management — Inject configuration and credentials without baking them into images
Storage orchestration — Automatically attach cloud disks, NFS, or other storage to Pods

The Declarative Model — the most important concept:

You tell K8s what you want (desired state in YAML), not how to do it. K8s continuously reconciles actual state with desired state. If you say "I want 5 replicas" and one crashes, K8s creates a new one automatically. This reconciliation loop runs forever.

How K8s Reconciliation Loop Works

You write YAML

→

kubectl apply

→

API Server

→

Stored in etcd

↓

Controller watches etcd

→

Compares desired vs actual

→

Takes action (create/delete Pods)

↓

Loop runs forever — self-healing

Interview Answer: "Kubernetes is a container orchestration platform that manages containerized workloads across a cluster of machines. Its core principle is declarative: you define desired state in YAML, and controllers continuously reconcile actual state to match. It handles scheduling, self-healing, scaling, service discovery, rolling updates, and storage orchestration. It was created by Google based on their internal Borg system."

Real-World Usage: Spotify runs 100+ K8s clusters serving 400M+ users. Airbnb migrated from EC2 instances to K8s to achieve consistent deployments across 1000+ services. Pinterest uses K8s to handle 1B+ daily API requests with auto-scaling. Every major tech company runs on Kubernetes.

Cluster

Simple Definition: A cluster is the entire Kubernetes deployment — all the machines (nodes) working together, managed by a control plane.

Deep Dive:

A cluster has two types of machines:

Control Plane Nodes (Masters):

Run the "brain" of Kubernetes — API Server, etcd, Scheduler, Controller Manager
Typically 3 or 5 nodes for high availability (odd number needed for etcd quorum)
Should NOT run application workloads in production
In managed K8s (EKS, GKE, AKS), the cloud provider manages these entirely

Worker Nodes:

Run your actual application Pods
Each runs: kubelet (agent), kube-proxy (networking), container runtime (containerd)
Can be physical servers, VMs, or cloud instances
Can be added/removed dynamically (auto-scaling)
Can have different sizes (mix of large and small instances)

Interview Answer: "A Kubernetes cluster consists of control plane nodes (running API Server, etcd, Scheduler, Controller Manager) and worker nodes (running kubelet, kube-proxy, and containerd). The control plane makes decisions about the cluster, while workers run the actual workloads. For HA, you run 3 or 5 control plane nodes. Managed services like EKS/GKE handle the control plane for you."

Real-World Usage: A typical startup runs 1-3 clusters (dev, staging, prod) on EKS/GKE with 5-50 worker nodes each. Large enterprises like PayPal run hundreds of clusters across multiple regions. The trend is multi-cluster architectures where each cluster has a specific purpose or serves a specific region.

Node

Simple Definition: A node is a single machine (physical or virtual) in a Kubernetes cluster that runs containerized workloads.

Deep Dive:

Every worker node runs three essential components:

kubelet — The agent that communicates with the control plane. It receives Pod specifications and ensures the described containers are running and healthy.
kube-proxy — Maintains network rules for Service routing. Implements iptables or IPVS rules.
Container runtime — The software that actually runs containers (containerd or CRI-O).

Node lifecycle:

Registration — Node joins the cluster and registers with the API server
Heartbeat — kubelet sends heartbeats via Lease objects in kube-node-lease namespace
NotReady — If heartbeats stop, the node is marked NotReady after 40s (default)
Eviction — After 5 minutes of NotReady, Pods are rescheduled to other nodes
Drain — kubectl drain gracefully evicts all Pods before maintenance
Cordon — kubectl cordon marks a node as unschedulable without evicting existing Pods

Interview Answer: "A node is a machine in a K8s cluster running kubelet, kube-proxy, and a container runtime. Kubelet ensures Pods are running as specified, kube-proxy handles networking, and the runtime (containerd) runs containers. Nodes send heartbeats to the control plane. If a node goes down, its Pods are automatically rescheduled. You can drain a node for maintenance or cordon it to prevent new scheduling."

Pod

Simple Definition: A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share the same network (IP address) and storage.

Deep Dive:

Pod Lifecycle Flow

Pending

→

Init Containers

→

Running

↓ termination signal

Remove from endpoints

→

preStop hook

→

SIGTERM

→

Grace period (30s)

→

SIGKILL

Key properties:

Shared network — All containers in a Pod share one IP address and port space. They talk to each other via localhost
Shared storage — Containers can mount the same volumes to share files
Co-scheduled — All containers in a Pod always run on the same node
Ephemeral — Pods are disposable. Created, destroyed, and replaced, never "repaired"

Multi-container patterns:

Sidecar — Helper container (Envoy proxy, log shipper, monitoring agent)
Init Container — Runs before the main container to set up prerequisites
Ambassador — Proxy for outbound connections
Adapter — Transforms output (log format conversion)

Interview Answer: "A Pod is the smallest deployable unit in K8s, wrapping one or more tightly coupled containers that share network and storage. Pods are ephemeral — designed to be replaced, not repaired. The most common pattern is one container per Pod. Multi-container Pods are used for sidecars (proxy, logging) and init containers (setup tasks). You manage Pods through higher-level resources like Deployments."

Real-World Usage: At Lyft, each microservice Pod has a main application container plus an Envoy sidecar for service mesh traffic. Init containers wait for database migrations to complete before the app starts. When Istio is enabled, a sidecar proxy is automatically injected into every Pod to handle mTLS and traffic management.

Namespace

Simple Definition: A namespace is a virtual partition within a cluster that isolates resources. It's like a folder that separates different teams, projects, or environments.

Deep Dive:

Default namespaces: default, kube-system, kube-public, kube-node-lease.

What namespaces scope: Pods, Services, Deployments, ConfigMaps, Secrets, Roles, ServiceAccounts, PVCs.

What namespaces DON'T scope (cluster-wide): Nodes, PersistentVolumes, ClusterRoles, StorageClasses, Namespaces themselves.

Common strategies: per-team (team-payments), per-app (app-checkout), per-environment (staging), or hybrid.

Governance tools: ResourceQuota (cap resources), LimitRange (set defaults), NetworkPolicy (firewall), RBAC (access control).

Interview Answer: "Namespaces partition a cluster for resource isolation. They scope most resources (Pods, Services) but not cluster-wide ones (Nodes, PVs). You apply RBAC, ResourceQuotas, NetworkPolicies per namespace. Common strategies: per-team or per-application."

Real-World Usage: At Stripe, each team gets their own namespace with pre-configured RBAC, ResourceQuotas, and NetworkPolicies. A namespace provisioning controller automates this when a new team onboards.

Labels, Selectors & Annotations

Simple Definition: Labels are key-value tags for identifying and grouping resources. Selectors query by labels. Annotations store non-identifying metadata.

Deep Dive:

Labels are the glue of K8s. A Service finds Pods, a Deployment manages ReplicaSets, NetworkPolicies target Pods — all through label selectors.

metadata:
  labels:
    app: payment-service
    team: payments
    env: production

Selector types: Equality-based (app = my-api) and Set-based (env in (production, staging)).

Annotations store larger metadata: build timestamps, Git SHAs, monitoring config (prometheus.io/scrape: "true").

Interview Answer: "Labels are key-value pairs for identification and selection. Services find Pods, Deployments manage ReplicaSets through selectors. Annotations store non-selecting metadata. A consistent labeling convention is essential for cost tracking, monitoring, and governance."

Real-World Usage: Kubecost uses team: payments labels to attribute costs. Prometheus discovers scrape targets via annotations. Kyverno enforces that every Deployment must have team and app labels.

kubectl

Simple Definition: kubectl is the command-line tool for interacting with Kubernetes clusters. It sends requests to the API server.

# Viewing
kubectl get pods -A                       # All namespaces
kubectl describe pod my-pod               # Detailed info
# Debugging
kubectl logs my-pod -f                    # Stream logs
kubectl exec -it my-pod -- /bin/sh        # Shell in
kubectl debug -it my-pod --image=busybox  # Ephemeral debug
# Managing
kubectl apply -f manifest.yaml            # Create/update
kubectl rollout undo deploy/my-app        # Rollback
kubectl scale deploy my-app --replicas=5  # Scale
# Networking
kubectl port-forward svc/my-app 8080:80   # Local tunnel
# Context
kubectl config use-context prod           # Switch cluster

Interview Answer: "kubectl is the K8s CLI. Key commands: get/describe for viewing, logs/exec for debugging, apply/delete for managing, rollout for deployments, port-forward for networking. In production, direct kubectl should be limited — GitOps tools handle deployments."

Workloads

The resources that run your applications — Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs.

Deployment & ReplicaSet

Simple Definition: A Deployment manages identical Pod replicas with rolling updates, rollbacks, and scaling. ReplicaSet is the mechanism underneath.

Deployment → ReplicaSet → Pods Chain

Deployment

→

ReplicaSet (v2)

→

Pod

↓ on update

Deployment

→

ReplicaSet (v3) ↑

→

New Pods

Old ReplicaSet scales down, new one scales up = rolling update

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: payment-api
  template:
    metadata:
      labels:
        app: payment-api
    spec:
      containers:
      - name: api
        image: payment-api:2.1.0
        resources:
          requests: { cpu: "250m", memory: "256Mi" }
          limits: { cpu: "1", memory: "1Gi" }

Rolling update: New ReplicaSet created → new Pods start → pass readiness probes → old Pods terminated. maxSurge: 1 = at most 4 total. maxUnavailable: 0 = always 3 ready.

Rollback: kubectl rollout undo deployment/payment-api (K8s keeps 10 revisions by default).

Interview Answer: "A Deployment manages ReplicaSets which manage Pods. Rolling updates are controlled by maxSurge/maxUnavailable. New Pods must pass readiness probes before old ones terminate. Keeps revision history for rollbacks. Apps must be stateless — any replica handles any request."

Real-World Usage: Every stateless microservice runs as a Deployment. Typical config: 3-10 replicas, maxSurge=25% maxUnavailable=25%, VPA-recommended requests, readiness probes on /health.

StatefulSet

Simple Definition: Like a Deployment but for apps needing stable identity and persistent storage — databases, Kafka, ZooKeeper.

What makes it special:

Stable Pod names — mysql-0, mysql-1, mysql-2
Stable DNS — mysql-0.mysql-headless.default.svc.cluster.local
Persistent storage — Each Pod gets its own PVC that survives restarts
Ordered operations — Created 0→1→2, deleted 2→1→0

Requires a headless Service (clusterIP: None) and volumeClaimTemplates for per-Pod storage.

Interview Answer: "StatefulSets provide stable names (ordinal indices), stable DNS (via headless Service), persistent storage (PVC per Pod), and ordered deployment. Used for databases, Kafka, ZooKeeper. Each Pod gets its own PVC through volumeClaimTemplates."

Real-World Usage: LinkedIn runs Kafka on StatefulSets using Strimzi Operator. Each broker (kafka-0, kafka-1, kafka-2) has stable identity and persistent volume. CloudNativePG manages PostgreSQL StatefulSets with automated failover.

DaemonSet

Simple Definition: Ensures a Pod runs on every node (or a selected subset). When new nodes join, Pods are auto-added.

What runs as DaemonSets: Log collection (Fluent Bit), monitoring (Node Exporter, Datadog), network plugins (Calico, Cilium), storage drivers (CSI), security agents (Falco).

Use nodeSelector or tolerations to restrict to specific node types (e.g., GPU nodes only).

Interview Answer: "DaemonSets ensure one Pod per node for infrastructure plumbing: log collectors, monitoring agents, CNI plugins, storage drivers. Use nodeSelector and tolerations for targeting."

Job & CronJob

Simple Definition: Jobs run Pods to completion (one-time tasks). CronJobs run Jobs on a schedule.

Key settings: backoffLimit (retries), activeDeadlineSeconds (timeout), concurrencyPolicy (Allow/Forbid/Replace for CronJobs), restartPolicy (Never or OnFailure).

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
spec:
  schedule: "0 2 * * *"           # 2 AM daily
  concurrencyPolicy: Forbid       # Skip if previous still running
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: report
            image: report-gen:1.0

Interview Answer: "Jobs run to completion (migrations, batch). Key settings: backoffLimit, activeDeadlineSeconds, restartPolicy=Never|OnFailure. CronJobs are scheduled Jobs with concurrencyPolicy controlling overlap. For complex DAGs, use Argo Workflows."

ReplicaSet

Simple Definition: Maintains a specified number of identical Pod replicas. Managed by Deployments — you almost never create one directly.

Deployments create new ReplicaSets on updates and scale the old one down. Old ReplicaSets are kept (default 10) for rollback capability.

Interview Answer: "ReplicaSet ensures the right number of Pods. It's the backend for Deployments — never created directly. On update, a new RS scales up while the old scales down. Old RS retained for rollbacks."

Networking

How Pods communicate — Services, Ingress, Gateway API, DNS, CNI, Network Policies, and Service Mesh.

Service

Simple Definition: A Service provides a stable network endpoint (DNS name + IP) for ephemeral Pods. Since Pods get new IPs when recreated, a Service gives a permanent address.

Service Routing Flow

Client request

→

Service IP (ClusterIP)

→

kube-proxy (iptables/IPVS)

↓ routes to healthy Pods

Pod A (10.0.1.5)

Pod B (10.0.2.3)

Pod C (not ready) — skipped

Types: ClusterIP (internal, default), NodePort (expose on node ports 30000-32767), LoadBalancer (cloud LB, costs money), ExternalName (CNAME alias), Headless (clusterIP: None, returns Pod IPs directly for StatefulSets).

Interview Answer: "Services give stable DNS/IP to ephemeral Pods. ClusterIP=internal, NodePort=node exposure, LoadBalancer=cloud LB, Headless=direct Pod IPs. Routing via kube-proxy iptables/IPVS. Use Ingress to consolidate external traffic through one LB."

Real-World Usage: Every microservice has a ClusterIP Service (http://payment-api/charge). One or two LoadBalancer Services for the Ingress Controller. ExternalName Services wrap external databases so app code uses K8s DNS everywhere.

Ingress & Ingress Controller

Simple Definition: Ingress routes external HTTP/HTTPS traffic to internal Services based on hostname and URL path. An Ingress Controller (NGINX, Traefik) implements these rules.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
spec:
  ingressClassName: nginx
  tls:
  - hosts: [api.myapp.com]
    secretName: tls-secret
  rules:
  - host: api.myapp.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service: { name: api-v1, port: { number: 80 } }
      - path: /
        pathType: Prefix
        backend:
          service: { name: frontend, port: { number: 80 } }

Controllers: NGINX (most popular), Traefik, HAProxy, AWS ALB, Istio Gateway.

Interview Answer: "Ingress manages external HTTP/HTTPS with host/path routing, TLS termination. Requires an Ingress Controller (NGINX, Traefik). One Ingress Controller behind a single LoadBalancer replaces many individual LB Services, saving cost. Gateway API is the newer replacement."

Gateway API

Simple Definition: Next-gen Ingress replacement. Role-based separation: GatewayClass (infra), Gateway (ops), HTTPRoute (devs). Supports TCP, gRPC, traffic splitting natively.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: payment-route
spec:
  parentRefs:
  - name: production-gateway
  hostnames: ["api.myapp.com"]
  rules:
  - matches:
    - path: { type: PathPrefix, value: /payments }
    backendRefs:
    - name: payment-v2
      port: 80
      weight: 90          # 90% to v2
    - name: payment-v3
      port: 80
      weight: 10          # 10% canary

Interview Answer: "Gateway API succeeds Ingress with role-based resources: GatewayClass (infra), Gateway (ops), HTTPRoute (devs). Supports TCP/UDP/gRPC natively, traffic splitting, header-based routing. Portable across implementations (Istio, Cilium, NGINX). New projects should use Gateway API over Ingress."

DNS (CoreDNS) & kube-proxy

Simple Definition: CoreDNS is the cluster's DNS server (Services reachable by name). kube-proxy handles the actual network routing from Service IPs to Pod IPs.

DNS format: my-service.my-namespace.svc.cluster.local (or just my-service within same namespace).

kube-proxy modes: iptables (O(n), default), IPVS (O(1), better at scale), eBPF/Cilium (highest performance, replaces kube-proxy).

East-West = service-to-service within cluster. North-South = external traffic.

Interview Answer: "CoreDNS provides service discovery by name. kube-proxy implements routing via iptables (O(n)), IPVS (O(1)), or eBPF (Cilium, fastest). East-west is internal, north-south is external."

CNI (Container Network Interface)

Simple Definition: CNI is the plugin standard for Pod networking. K8s doesn't do networking itself — it delegates to a CNI plugin.

Major plugins:

Cilium — eBPF-based, highest performance, built-in observability (Hubble), service mesh. The rising star. CNCF graduated
Calico — Most widely deployed. BGP routing, excellent Network Policy support
Flannel — Simple overlay, NO Network Policy support. Dev only
AWS VPC CNI — Real VPC IPs on EKS. Limited by ENI capacity per node

Interview Answer: "CNI is the plugin standard. Key choices: Cilium (eBPF, highest performance, CNCF graduated), Calico (BGP, mature), AWS VPC CNI (real VPC IPs). Choice affects performance, security features, and cloud integration."

Network Policy

Simple Definition: Firewall rules for Pods. Controls which Pods can talk to which. Default is allow-all; once a policy applies, it becomes default-deny for that Pod.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
spec:
  podSelector:
    matchLabels: { app: api }
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector:
        matchLabels: { app: frontend }
    ports:
    - port: 8080

Important: Requires a CNI that supports it. Calico, Cilium = yes. Flannel = no.

Interview Answer: "Network Policies are Pod-level firewall rules using label selectors. Default is allow-all; applying a policy makes that direction default-deny. Best practice: default-deny per namespace, then explicit allows. Requires Calico or Cilium. Essential for micro-segmentation and compliance (PCI-DSS)."

Service Mesh (Istio, Linkerd)

Simple Definition: Infrastructure layer handling service-to-service communication transparently: mTLS encryption, traffic management, observability, and authorization — without code changes.

Capabilities: mTLS (zero-trust), retries/timeouts/circuit breaking, traffic splitting (canary), request-level metrics and traces, fine-grained AuthorizationPolicies.

Options: Istio (most features, CNCF graduated, Ambient mode = sidecar-less), Linkerd (simpler, Rust proxy), Cilium (eBPF-based, no sidecars).

When to adopt: 10+ microservices AND you need mandatory mTLS, advanced traffic management, or request-level observability. Don't adopt for 2-3 services.

Interview Answer: "Service mesh handles service-to-service communication via sidecar proxies (Envoy). Provides mTLS, traffic management, observability, authorization. Istio=most features, Linkerd=simpler, Cilium=eBPF no sidecars. Adopt when microservices complexity justifies the overhead."

Configuration & Storage

ConfigMaps, Secrets, Volumes, PV/PVC, StorageClasses, and CSI.

ConfigMap

Simple Definition: Stores non-sensitive configuration as key-value pairs, separate from images. Consumed as env vars or mounted as files.

Volume-mounted ConfigMaps support hot-reload (~60s). Env vars do NOT — need Pod restart. immutable: true for better performance at scale. 1 MB size limit.

Interview Answer: "ConfigMaps decouple config from images. Consumed as env vars or files. Volume mounts support hot-reload; env vars need restart. Set immutable:true for performance. Use Secrets for sensitive data."

Secret

Simple Definition: Stores sensitive data (passwords, API keys, TLS certs). Base64-encoded (NOT encrypted by default).

Types: Opaque, TLS, dockerconfigjson, service-account-token, basic-auth.

Security: Base64 ≠ encryption! For production: enable etcd encryption at rest, use External Secrets Operator (Vault, AWS Secrets Manager), restrict RBAC, use Sealed Secrets for GitOps.

Interview Answer: "Secrets store sensitive data, base64-encoded (not encrypted). Types: Opaque, TLS, docker-registry. For production: encrypt etcd at rest, use External Secrets Operator to sync from Vault/AWS, restrict RBAC, use Sealed Secrets for GitOps."

PV, PVC & StorageClass

Simple Definition: PV = provisioned storage (disk). PVC = request for storage. StorageClass = category/provisioner enabling dynamic provisioning via CSI drivers.

Storage Provisioning Flow

Pod needs storage

→

PVC created

→

StorageClass

→

CSI Driver

↓

Cloud disk provisioned (EBS)

→

PV bound to PVC

→

Mounted in Pod

Access Modes: RWO (single node), ROX (read-only many), RWX (read-write many, needs EFS/NFS), RWOP (single Pod).

Reclaim: Delete (default) or Retain (keep data). VolumeSnapshots for backups.

Interview Answer: "PV=storage, PVC=request, StorageClass=provisioner. Dynamic provisioning: PVC triggers CSI driver to create disk. Access modes: RWO (most common), RWX (shared, needs EFS). WaitForFirstConsumer binding ensures correct AZ. Volume Snapshots for backups."

emptyDir, hostPath & Ephemeral Volumes

Simple Definition: emptyDir = temp storage tied to Pod lifetime (shared between containers). hostPath = mounts host filesystem (security risk, DaemonSets only).

Interview Answer: "emptyDir is Pod-lifetime temp storage for scratch space and inter-container sharing. hostPath mounts host filesystem — security risk, only for DaemonSets. Neither persists beyond Pod life."

Scheduling & Scaling

How K8s places Pods, and how it auto-scales workloads and infrastructure.

Resource Requests & Limits

Simple Definition: Requests = guaranteed minimum (scheduler uses for placement). Limits = max allowed (CPU throttled, memory OOMKilled).

QoS Classes: Guaranteed (requests==limits, highest priority), Burstable (requests<limits), BestEffort (none set, first evicted).

Units: CPU: 1=1 core, 100m=0.1 core. Memory: 128Mi, 1Gi.

Tip: Many teams remove CPU limits to avoid CFS throttling. Always set memory limits to prevent node-level OOM.

Interview Answer: "Requests=scheduler input (guaranteed min). Limits=max: CPU throttles, memory OOMKills. QoS: Guaranteed (requests==limits), Burstable, BestEffort. Many teams omit CPU limits to avoid throttling but always set memory limits. VPA helps right-size."

Node Affinity, Pod Affinity & Anti-Affinity

Simple Definition: Node Affinity = which nodes a Pod can run on. Pod Anti-Affinity = spread replicas apart. Pod Affinity = co-locate for performance. Topology Spread = distribute evenly across zones.

Hard vs Soft: requiredDuring... (must) vs preferredDuring... (try).

Interview Answer: "Node Affinity uses node labels for placement (required=hard, preferred=soft). Pod Anti-Affinity spreads replicas across nodes/zones for HA. Pod Affinity co-locates for low latency. Topology Spread Constraints distribute evenly across zones with maxSkew."

Taints & Tolerations

Simple Definition: Taints repel Pods from nodes. Tolerations let specific Pods ignore taints. Used to dedicate nodes (GPU, Spot).

Effects: NoSchedule, PreferNoSchedule, NoExecute (evict existing).

Interview Answer: "Taints repel Pods; Tolerations let specific Pods ignore taints. Effects: NoSchedule, PreferNoSchedule, NoExecute. Use case: dedicate GPU nodes for ML, Spot nodes for batch jobs. K8s auto-taints unhealthy nodes."

HPA (Horizontal Pod Autoscaler)

Simple Definition: Automatically scales Pod replicas based on CPU, memory, or custom metrics.

v2 supports multiple metrics. behavior section controls scaling speed (scale up fast, down slow to prevent flapping). Requires Metrics Server.

Interview Answer: "HPA scales replicas on metrics: CPU, memory, custom (Prometheus Adapter). v2 supports multiple metrics. Behavior section controls speed — fast up, slow down. Needs Metrics Server. Can't share CPU metric with VPA."

VPA, KEDA & Cluster Autoscaler / Karpenter

Simple Definition: VPA = right-size resource requests. KEDA = scale on external events + scale to zero. Karpenter = smart node provisioning replacing Cluster Autoscaler.

# KEDA: Scale Kafka consumer to zero
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef: { name: kafka-consumer }
  minReplicaCount: 0
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      topic: orders
      lagThreshold: "100"

Interview Answer: "VPA right-sizes requests (recommender mode=safe). KEDA scales on 60+ event sources (Kafka lag, SQS, Prometheus) with scale-to-zero. Karpenter replaces Cluster Autoscaler: faster provisioning, auto instance-type selection, Spot handling, node consolidation."

PDB (Pod Disruption Budget)

Simple Definition: Limits how many Pods can be down during voluntary disruptions (node drains, upgrades). Specifies minAvailable or maxUnavailable.

Applies to voluntary disruptions (drain, autoscaler), NOT involuntary (node crash, OOM).

Interview Answer: "PDB sets minimum available or maximum unavailable Pods during voluntary disruptions. Protects against all-replicas-down during node drains and upgrades. Every production Deployment should have one."

Security

RBAC, Pod Security, admission control, policy engines, secrets management, and runtime protection.

RBAC (Role-Based Access Control)

Simple Definition: Controls who can do what. Role (namespace permissions) + ClusterRole (cluster-wide) bound to users/groups/ServiceAccounts via RoleBinding/ClusterRoleBinding.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: dev
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]
---
kind: RoleBinding
metadata:
  name: dev-team-reader
  namespace: dev
subjects:
- kind: Group
  name: dev-team
roleRef:
  kind: Role
  name: pod-reader

Test: kubectl auth can-i create pods --namespace dev

Interview Answer: "RBAC: Role (namespace) + ClusterRole (global), bound via RoleBinding/ClusterRoleBinding to Users, Groups, or ServiceAccounts. Best practice: least privilege, namespace-scoped Roles, dedicated ServiceAccounts per workload, audit with 'kubectl auth can-i'."

SecurityContext & Pod Security Standards

Simple Definition: SecurityContext sets container security (runAsNonRoot, readOnlyRootFilesystem, drop capabilities). PSS defines three levels: Privileged, Baseline, Restricted.

Pod Security Admission enforces PSS per namespace via labels: enforce, audit, warn.

Interview Answer: "SecurityContext: runAsNonRoot, readOnlyRootFilesystem, drop ALL capabilities, allowPrivilegeEscalation=false. PSS levels: Privileged (system), Baseline (minimum), Restricted (target for apps). Enforced via namespace labels."

Admission Controllers & Webhooks

Simple Definition: Plugins intercepting API requests before storage. Mutating webhooks modify requests (inject sidecars). Validating webhooks accept/reject (enforce policies).

API Server Request Flow

API Request

→

Authentication

→

Authorization (RBAC)

↓

Mutating Admission

→

Schema Validation

→

Validating Admission

↓

Persist to etcd

Interview Answer: "Admission controllers intercept after auth, before persist. Mutating runs first (inject sidecars, add labels), then Validating (enforce policies, reject violations). Policy engines (Kyverno, OPA) work via these webhooks."

OPA/Gatekeeper & Kyverno

Simple Definition: Policy engines that auto-enforce rules. Kyverno = YAML policies (simple). OPA/Gatekeeper = Rego language (powerful but complex).

# Kyverno: Block :latest tag
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-latest
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-tag
    match:
      any:
      - resources: { kinds: [Pod] }
    validate:
      message: "Using ':latest' is not allowed."
      pattern:
        spec:
          containers:
          - image: "!*:latest"

Interview Answer: "Kyverno: K8s-native YAML policies, can validate, mutate, generate. OPA/Gatekeeper: Rego language, more powerful. Common policies: require labels, block latest tag, enforce resource limits, restrict registries, mandate security contexts."

Image Security & Runtime Protection

Simple Definition: Scanning (Trivy for CVEs), signing (Cosign for provenance), SBOM generation, and runtime monitoring (Falco for anomalous behavior).

Interview Answer: "Defense-in-depth: scan in CI (Trivy blocks critical CVEs), sign images (Cosign/Sigstore), verify at admission (Kyverno checks signatures), monitor runtime (Falco detects unexpected processes/network). SBOM for regulatory compliance."

Observability

Metrics, logging, tracing, probes, alerting, and the three pillars of understanding your systems.

Prometheus & Grafana

Simple Definition: Prometheus collects metrics (pull-based, PromQL queries). Grafana visualizes dashboards. Together = standard K8s monitoring stack.

Key sources: kube-state-metrics (K8s object state), Node Exporter (node hardware), Metrics Server (HPA/kubectl top), app /metrics endpoints.

# PromQL examples
rate(http_requests_total{status="500"}[5m])     # Error rate
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))  # P99
sum by (service)(rate(http_requests_total[5m]))  # RPS per service

Long-term: Thanos or Grafana Mimir for multi-cluster, long-retention storage.

Interview Answer: "Prometheus: pull-based metrics with PromQL. Alertmanager for routing alerts. kube-state-metrics + Node Exporter for infrastructure. Grafana for dashboards. Thanos/Mimir for long-term multi-cluster storage. CNCF graduated, industry standard."

Logging (Loki, EFK, Fluent Bit)

Simple Definition: Containers write to stdout/stderr. A DaemonSet collector (Fluent Bit) ships logs to a backend (Loki or Elasticsearch) for search and analysis.

Loki (Grafana) = label-based, cost-effective, Grafana integration. Elasticsearch = full-text search, powerful but heavy. Loki increasingly replacing EFK.

Interview Answer: "Containers → stdout/stderr → node files → Fluent Bit DaemonSet → backend. Loki: label-based, cheap. Elasticsearch: full-text, heavy. Structured JSON logging enables better filtering. Correlate logs with metrics on same Grafana dashboard."

Distributed Tracing & OpenTelemetry

Simple Definition: Tracing follows a request across microservices showing where time is spent. OpenTelemetry (OTel) is the CNCF standard unifying metrics, logs, and traces.

How: Request gets trace ID → each service creates a span → propagated via headers → assembled into full trace.

Backends: Jaeger (CNCF), Grafana Tempo, Zipkin. OTel Collector = vendor-neutral pipeline.

Interview Answer: "Tracing uses trace IDs and spans across services. Backends: Jaeger, Tempo. OpenTelemetry = CNCF standard unifying metrics/logs/traces with vendor-neutral SDKs and Collector. Auto-instrumentation adds tracing without code changes."

Probes: Liveness, Readiness & Startup

Simple Definition: Liveness = alive? (kill if not). Readiness = ready for traffic? (remove from endpoints if not). Startup = still booting? (gate other probes).

Methods: HTTP GET, TCP Socket, gRPC, Exec command.

Common mistake: Aggressive liveness probes restart healthy containers under load, causing cascading failures. Keep liveness lenient, readiness responsive.

Interview Answer: "Liveness: restart dead containers. Readiness: remove unready from traffic. Startup: gate probes for slow starters. Methods: HTTP, TCP, gRPC, exec. Key: lenient liveness, responsive readiness. Every production service needs both."

SLI, SLO & SLA

Simple Definition: SLI = metric (availability, latency). SLO = target (99.9% availability). SLA = customer contract with penalties. Error budget = 100% - SLO.

Four Golden Signals: Latency, Traffic, Errors, Saturation.

SLO 99.9% = 43 min downtime/month. When error budget exhausted → freeze features, fix reliability.

Interview Answer: "SLI=measurable indicator, SLO=internal target, SLA=external contract. Error budget (100%-SLO) drives prioritization. Four golden signals: latency, traffic, errors, saturation. SLOs bridge platform and product teams."

Deployment Strategies & GitOps

Rolling, blue-green, canary, Argo CD, Flux, Helm, and Kustomize.

Rolling Update

Simple Definition: K8s default. Replaces Pods incrementally via maxSurge/maxUnavailable. Both versions run simultaneously during transition.

Interview Answer: "Rolling update replaces Pods gradually. maxSurge=extra Pods allowed, maxUnavailable=Pods that can be down. New must pass readiness before old are killed. Rollout stalls if new Pods fail — production stays safe."

Blue-Green Deployment

Simple Definition: Two environments (Blue=old, Green=new). Switch all traffic at once by changing Service selector. Instant rollback = switch back.

Pros: Atomic switchover, instant rollback. Cons: Double resources, DB schema compatibility needed.

Interview Answer: "Blue-green: two identical environments, atomic traffic switch via Service selector. Instant rollback. Cons: double infrastructure, DB compatibility. Argo Rollouts automates with health analysis."

Canary Deployment

Simple Definition: Route small % of traffic to new version, monitor, gradually increase. Gold standard for risk-sensitive deployments.

# Argo Rollouts canary
spec:
  strategy:
    canary:
      steps:
      - setWeight: 5
      - pause: {duration: 5m}
      - analysis:
          templates: [{ templateName: success-rate }]
      - setWeight: 25
      - pause: {duration: 10m}
      - setWeight: 100

Interview Answer: "Canary routes small traffic % to new version with automated analysis (Prometheus error rate/latency). Auto-rollback on failure. Argo Rollouts or Gateway API traffic splitting. Safest strategy — catches issues before affecting all users."

GitOps (Argo CD & Flux)

Simple Definition: Git as single source of truth. Agents in-cluster pull state from Git and reconcile continuously. All changes via PRs.

GitOps CI/CD Pipeline Flow

Code Commit

→

CI: Build & Test

→

Build Image

→

Scan (Trivy)

↓

Push to Registry

→

Update GitOps Repo

→

Argo CD detects

→

Deploy to Cluster

Rollback = revert the Git commit

Argo CD: UI, SSO, ApplicationSets, auto-sync, self-healing. Flux: Toolkit approach, deeper Helm/Kustomize integration, no built-in UI.

Interview Answer: "GitOps: Git=source of truth, agents pull+reconcile (not push). Benefits: audit trail, easy rollback (revert commit), no cluster creds externally, drift correction. Argo CD has UI+ApplicationSets. Flux has deeper Helm/Kustomize integration. Industry standard."

Helm & Kustomize

Simple Definition: Helm = package manager (templated Charts + Values). Kustomize = YAML overlays without templates (built into kubectl).

Use Helm for: third-party packages, complex parameterization. Use Kustomize for: your own apps, simple environment overrides. Many teams use both.

Interview Answer: "Helm: Charts with Go templates and Values for parameterization. Kustomize: overlays and patches, no template language, built into kubectl. Helm for external tools (Prometheus, NGINX), Kustomize for internal apps. Most teams use both."

Cluster Architecture

Control plane, etcd, multi-cluster, multi-tenancy, managed vs. self-managed.

Control Plane Components

Simple Definition: The brain: API Server (front door), etcd (memory/state), Scheduler (Pod placement), Controller Manager (reconciliation loops).

API Server: Auth → AuthZ → Admission → Validation → Persist. Stateless, run multiple replicas.

etcd: Raft consensus, 3 or 5 nodes. Fast SSDs required. Losing etcd without backup = losing everything.

Scheduler: Filter (which nodes CAN) → Score (which is BEST). Considers resources, affinity, taints, topology.

Controller Manager: Runs Deployment, ReplicaSet, Node, Job, Endpoint controllers. Watch → Compare → Act.

Interview Answer: "API Server: RESTful front door (auth/authz/admission). etcd: distributed KV store (needs backup). Scheduler: filter→score for Pod placement. Controller Manager: reconciliation loops. For HA: multiple API servers, 3-5 etcd nodes, leader election."

Multi-Cluster & Multi-Tenancy

Simple Definition: Multi-cluster = separate clusters for blast radius, compliance, scale. Multi-tenancy = sharing: namespace-per-tenant (cheap), vCluster (strong isolation), cluster-per-tenant (strongest).

Management: Cluster API (declarative lifecycle), Rancher (UI), Argo CD ApplicationSets (multi-cluster deploys).

Interview Answer: "Multi-cluster for blast radius, compliance, scale limits. Tenancy: namespace-per-tenant (RBAC+quotas), vCluster (virtual clusters), cluster-per-tenant (full isolation). Managed via Cluster API and Argo CD ApplicationSets."

Managed vs. Self-Managed Kubernetes

Simple Definition: Managed (EKS/GKE/AKS) = cloud handles control plane. Self-managed (kubeadm, k3s, RKE2) = you handle everything.

Managed: EKS (AWS, most popular), GKE (Google, most mature), AKS (Azure, free control plane).

Self-managed: kubeadm (official), k3s (edge/IoT, lightweight), RKE2 (FIPS, government), OpenShift (enterprise).

Interview Answer: "Managed: cloud handles control plane + etcd. 90% of orgs should use managed. Self-managed for edge (k3s), regulated (RKE2), on-prem, massive scale cost optimization. Cluster API standardizes management across both."

Operators & CRDs

Extending Kubernetes with custom resources and the Operator pattern.

CRD (Custom Resource Definition)

Simple Definition: Extends the K8s API with your own resource types. After creating a CRD, manage custom resources with kubectl like built-in ones.

Examples: Certificate (cert-manager), VirtualService (Istio), PostgresCluster (CloudNativePG).

A CRD alone just stores data. A Custom Controller watches it and takes action = the Operator pattern.

Interview Answer: "CRDs extend the K8s API with custom types. Managed via kubectl like built-in resources. CRD alone is just storage — a Custom Controller makes it actionable (Operator pattern). Key concepts: Finalizers (cleanup before delete), Owner References (cascading delete)."

Operator Pattern

Simple Definition: CRDs + Custom Controllers encoding human operational knowledge. Automates Day 2 ops: deploy, upgrade, backup, scale, failover for complex applications.

Major Operators: cert-manager (TLS), Prometheus Operator (monitoring), CloudNativePG (PostgreSQL), Strimzi (Kafka), Crossplane (cloud infra), External Secrets Operator (secrets sync).

Build with: Kubebuilder (Go), Operator SDK (Go/Ansible/Helm), Metacontroller (any language).

Core pattern: Watch → Compare desired vs actual → Act. Runs continuously. Idempotent.

Interview Answer: "Operator = CRDs + Controller encoding ops knowledge. Automates Day 2: deploy, upgrade, backup, failover. Key operators: cert-manager, Prometheus Operator, CloudNativePG, Strimzi, Crossplane. Built with Kubebuilder/Operator SDK. Core: reconciliation loop (watch→compare→act)."

Real-World Usage: Instead of a 47-step runbook for PostgreSQL HA, apply a 20-line Cluster CR. CloudNativePG handles StatefulSet creation, replication, automated backup to S3, and automatic failover. What took a DBA hours now takes seconds.

Platform Engineering, Cost & Strategy

IDPs, FinOps, DR, cloud strategy, DORA metrics, and compliance.

Internal Developer Platform (IDP)

Simple Definition: Self-service layer on K8s. Developers deploy without deep K8s knowledge. Golden paths, templates, guardrails.

Tools: Backstage (CNCF portal), Crossplane (cloud infra as CRDs), Tilt/Skaffold (inner-loop dev).

Interview Answer: "IDP abstracts K8s complexity for developers. Backstage (service catalog+templates), Crossplane (cloud infra as K8s resources), golden paths (opinionated templates with CI/CD, monitoring pre-configured). Requires a platform team. Goal: developers focus on code."

FinOps & Cost Optimization

Simple Definition: Managing K8s spend. Levers: right-sizing (VPA), Spot instances, KEDA scale-to-zero, Karpenter, eliminating idle resources.

Tools: Kubecost, OpenCost (CNCF), Goldilocks (VPA dashboard). Chargeback via labels attributes costs to teams.

Interview Answer: "Right-size with VPA (30-50% savings), Spot for stateless (60-90%), KEDA scale-to-zero, Karpenter node consolidation. Kubecost/OpenCost for visibility. ResourceQuotas cap namespaces. Chargeback via labels. Most clusters are 60-80% over-provisioned."

Real-World Usage: Company's $180K/mo K8s bill dropped to $70K (61% reduction): right-sizing saved $50K, Karpenter+Spot saved $35K, KEDA scale-to-zero saved $15K, shutdown dev off-hours saved $10K.

Disaster Recovery & Chaos Engineering

Simple Definition: DR: Velero (backup/restore), etcd snapshots, Volume Snapshots. Chaos engineering (Chaos Mesh, Litmus) validates resilience by injecting failures.

RTO = max downtime. RPO = max data loss. Lower values = higher cost (multi-region).

Patterns: Multi-AZ (minimum), Active-Active (lowest RTO), Active-Passive (simpler).

Interview Answer: "DR: Velero for K8s+volume backup, etcd snapshots for cluster state. RTO/RPO drive architecture and cost. Multi-AZ minimum for prod. Chaos engineering (Chaos Mesh, Litmus) validates DR before real incidents."

Cloud Strategy & Vendor Lock-In

Simple Definition: K8s provides compute portability. Real lock-in: managed services (RDS, BigQuery), data gravity, IAM. Hybrid via Anthos, Azure Arc, EKS Anywhere.

Interview Answer: "K8s gives compute portability. Real lock-in: managed services, data gravity, IAM. Strategy: abstract where practical (PostgreSQL over Aurora), accept lock-in where value is high. Multi-cloud is expensive — most orgs are hybrid by circumstance."

DORA Metrics, Conway's Law & Organizational Impact

Simple Definition: DORA = deployment frequency, lead time, MTTR, change failure rate. Conway's Law = systems mirror team structure. Platform teams enable autonomous product teams.

Team Topologies: Platform team (builds K8s platform), Stream-aligned (product features), Enabling (adoption help).

TCO: Compute + Storage + Networking + Licensing + People (biggest cost) + Opportunity cost.

Interview Answer: "DORA metrics measure engineering effectiveness. K8s+GitOps should improve all four. Conway's Law: architecture mirrors teams. Inverse Conway Maneuver: shape teams for desired architecture. Platform team of 5 can support 80+ developers. TCO: people cost often exceeds infrastructure."

Compliance, CIS Benchmarks & Zero Trust

Simple Definition: CIS Benchmarks (kube-bench) for hardening. Compliance (SOC2/PCI/HIPAA) via RBAC + NetworkPolicy + mTLS + audit logs + policy engines. Zero Trust = never trust, always verify.

Zero Trust stack: mTLS (service mesh), least-privilege RBAC, default-deny NetworkPolicies, PSS Restricted, short-lived credentials (Vault/IRSA), signed images only.

Interview Answer: "Compliance: RBAC (access), NetworkPolicy (segmentation), mTLS (encryption), audit logging (traceability), policy engines (automated enforcement). CIS benchmarks via kube-bench. Zero Trust: mTLS, least-privilege, default-deny, short-lived creds, signed images. Compliance-as-code = continuous enforcement."