Kubernetes Cluster Architecture

It Works on My Machine... But What About Prod?

Kubernetes (K8s) has indisputably won the container orchestration war. It is the operating system of the cloud. But while spinning up a minikube cluster on a laptop is easy, managing a fleet of 50+ production clusters across 3 regions is a different beast entirely.

At Sentrix, we manage Kubernetes infrastructure for large enterprises. Here are the hard-fought lessons we've learned from the trenches of Day 2 operations.

Lesson 1: Governance is Not Optional

In a small cluster, you might trust everyone with cluster-admin. At scale, this is suicide.

Policy as Code: We use OPA Gatekeeper or Kyverno to enforce rules programmatically.
- Rule: No container can run as root.
- Rule: All images must come from the internal trusted registry (no random Docker Hub images).
- Rule: All pods must have CPU/Memory limits defined. If a developer tries to apply a manifest that violates these rules, the cluster rejects it with a helpful error message.

Lesson 2: GitOps is the Only Way

If you are running kubectl apply -f from your laptop in production, you are doing it wrong. The state of your cluster is now dependent on your laptop's filesystem.

We use ArgoCD for 100% of our deployments.

Developer merges code to main.
CI builds the image.
CI updates the deployment.yaml in the Git repo.
ArgoCD detects the change in Git and syncs the cluster.

This provides an audit trail for every change. "Who changed the replica count?" Check the Git log. It also enables instant rollback: git revert.

Lesson 3: The "Goldilocks" Problem of Resources

Developers notoriously struggle to set correct resource requests and limits. Set them too low? OOMKilled (Out of Memory). Set them too high? You are wasting money on reserved capacity that sits idle.

VPA (Vertical Pod Autoscaler): We run VPA in "recommendation mode" to analyze usage over time and suggest optimal requests.
Karpenter: We replaced the standard Cluster Autoscaler with Karpenter. It provisions "just-in-time" nodes that perfectly fit the pending pods, rather than using rigid node groups. This reduced compute bills by ~20%.

Lesson 4: Developer Experience (Internal Platform)

Kubernetes is complex. Ingress, ServiceAccount, PersistentVolumeClaim—developers shouldn't need to be K8s certified to ship a feature.

We advocate for building an Internal Developer Platform (IDP) using tools like Backstage. The developer clicks "Create New Service". The platform:

Scaffolds a Repo.
Sets up CI/CD pipelines.
Provisions a namespace with default quotas.
Grants the team access.

This is "Platform Engineering"—treating your infrastructure as a product, and your developers as the customers.

Summary

Scaling Kubernetes is less about technology and more about process. It is about building guardrails that make it easy to do the right thing and hard to break the system. Automation, Policy as Code, and GitOps are the keys to sleeping soundly at night while running thousands of pods.

Our Infrastructure Services

Need help scaling your Kubernetes infrastructure? Our Cloud Solutions team has managed 50+ production clusters. We also offer Managed Services for 24/7 monitoring and incident response.

Let's talk infrastructure. Contact us for a free consultation.

It Works on My Machine... But What About Prod?

At Sentrix, we manage Kubernetes infrastructure for large enterprises. Here are the hard-fought lessons we've learned from the trenches of Day 2 operations.

Lesson 1: Governance is Not Optional

In a small cluster, you might trust everyone with cluster-admin. At scale, this is suicide.

Policy as Code: We use OPA Gatekeeper or Kyverno to enforce rules programmatically.

Rule: No container can run as root.
Rule: All images must come from the internal trusted registry (no random Docker Hub images).
Rule: All pods must have CPU/Memory limits defined. If a developer tries to apply a manifest that violates these rules, the cluster rejects it with a helpful error message.

Lesson 2: GitOps is the Only Way

If you are running kubectl apply -f from your laptop in production, you are doing it wrong. The state of your cluster is now dependent on your laptop's filesystem.

We use ArgoCD for 100% of our deployments.

Developer merges code to main.

CI builds the image.

CI updates the deployment.yaml in the Git repo.

ArgoCD detects the change in Git and syncs the cluster.

This provides an audit trail for every change. "Who changed the replica count?" Check the Git log. It also enables instant rollback: git revert.

Lesson 3: The "Goldilocks" Problem of Resources

Developers notoriously struggle to set correct resource requests and limits. Set them too low? OOMKilled (Out of Memory). Set them too high? You are wasting money on reserved capacity that sits idle.

VPA (Vertical Pod Autoscaler): We run VPA in "recommendation mode" to analyze usage over time and suggest optimal requests.

Karpenter: We replaced the standard Cluster Autoscaler with Karpenter. It provisions "just-in-time" nodes that perfectly fit the pending pods, rather than using rigid node groups. This reduced compute bills by ~20%.

Lesson 4: Developer Experience (Internal Platform)

Kubernetes is complex. Ingress, ServiceAccount, PersistentVolumeClaim—developers shouldn't need to be K8s certified to ship a feature.

We advocate for building an Internal Developer Platform (IDP) using tools like Backstage. The developer clicks "Create New Service". The platform:

Scaffolds a Repo.

Sets up CI/CD pipelines.

Provisions a namespace with default quotas.

Grants the team access.

This is "Platform Engineering"—treating your infrastructure as a product, and your developers as the customers.

Kubernetes at Scale: Lessons from 50+ Clusters

It Works on My Machine... But What About Prod?

Lesson 1: Governance is Not Optional

Lesson 2: GitOps is the Only Way

Lesson 3: The "Goldilocks" Problem of Resources

Lesson 4: Developer Experience (Internal Platform)

Summary

Related Reading

Our Infrastructure Services

Kubernetes at Scale: Lessons from 50+ Clusters

It Works on My Machine... But What About Prod?

Lesson 1: Governance is Not Optional

Lesson 2: GitOps is the Only Way

Lesson 3: The "Goldilocks" Problem of Resources

Lesson 4: Developer Experience (Internal Platform)

Summary

Related Reading

Our Infrastructure Services