5 Kubernetes lessons I wish someone told me earlier
After running Kubernetes in production for years, here are 5 lessons that took me way too long to learn.
1. Set resource limits on everything
I know it's tempting to skip this during development. Don't. One runaway pod without memory limits will OOM-kill your entire node, taking down unrelated workloads with it.
Start with conservative limits and adjust based on actual usage. Use VPA recommendations if you're unsure.
2. Liveness probes should be dead simple
Your liveness probe should check "is this process alive?" — not "can this process serve traffic?" That's what readiness probes are for.
I've seen teams put database connectivity checks in their liveness probe. When the database goes down, Kubernetes restarts every pod, creating a thundering herd that makes the outage 10x worse.
3. Pod Disruption Budgets aren't optional
If you're running more than one replica of anything, you need a PDB. Without one, a node drain during a cluster upgrade can take down all your replicas simultaneously.
4. Use namespaces for more than just organization
Namespaces are your best tool for resource isolation, RBAC boundaries, and network policies. Treat them as hard boundaries between teams and environments, not just folders.
5. Invest in local development early
If your developers can't run and test their services locally against a realistic Kubernetes environment, they'll push bugs to staging constantly. Tools like Tilt, Skaffold, or even a lightweight k3d cluster pay for themselves in the first week.