Insights · OpenShift
OpenShift Deployment Best Practices for Reliable Workload Rollouts
Overview
OpenShift deployment best practices start where Kubernetes tutorials end: your Deployment or StatefulSet must coexist with Security Context Constraints, project quotas, image pull secrets, and OpenShift Routes — not just pass kubectl apply on a laptop. Platform teams that skip these constraints discover them in production when pods sit Pending, SCC admission rejects containers, or ingress returns 503s because readiness probes never matched real startup behavior.
OCP adds opinionated defaults — internal registry mirroring, built-in Routes, integrated OAuth, and the Developer perspective — but those conveniences do not replace engineering discipline. Every workload needs explicit CPU and memory requests and limits, liveness and readiness probes aligned with application warm-up, graceful termination hooks, and labels that selectors and NetworkPolicies can target consistently.
This article documents the deployment patterns we enforce on managed and advisory engagements: rollout configuration, autoscaling boundaries, config separation, secret handling, and the handoff points between CI pipelines and cluster reconciliation. Apply them whether you deploy with oc, Helm, Kustomize, or a GitOps controller.
Rollout Strategies and Deployment Controller Behavior
RollingUpdate is the default Deployment strategy on OpenShift, but maxSurge and maxUnavailable must reflect real capacity. A single-replica Deployment with maxUnavailable 1 causes total outage during every push. For zero-downtime services, run at least two replicas, set maxUnavailable to 0 and maxSurge to 1, and ensure preStop hooks drain connections before SIGTERM. StatefulSets require ordered rollout — validate persistent volume reattachment and identity stability across pod restarts.
OpenShift does not ship built-in blue/green or canary controllers in the core platform; implement them with Argo Rollouts, Flagger, or service mesh traffic splitting when progressive delivery is required. For batch and CronJob workloads, use activeDeadlineSeconds and backoff limits so a stuck Job does not hold cluster resources indefinitely. DaemonSets on infra nodes need taints and tolerations documented — observability agents and log forwarders are common examples.
Use oc rollout status and oc rollout history as mandatory CI gates. Pin image digests in manifests rather than floating :latest tags; the internal registry supports imagestreams for native digest tracking when teams adopt OpenShift-native workflows. Record change metadata in annotations (git commit, pipeline ID) for traceability during incident response.
Recreate pods deliberately after ConfigMap changes when applications do not hot-reload config — stale mounts cause split-brain behavior across replicas. Use checksum annotations on pod templates to trigger rolling updates when ConfigMaps change. For StatefulSets, ordered rollout means config updates take longer; schedule during maintenance windows.
Health Probes, Resources, and Quality of Service
Readiness probes gate Service endpoints and Route backends; liveness probes restart unhealthy containers. HTTP probes against /healthz are insufficient when the process listens but cannot serve traffic — include dependency checks with timeouts that exceed worst-case cold start. Startup probes protect slow-init containers from premature liveness kills during JVM or cache warm-up.
Requests and limits define Quality of Service classes. Burstable pods are evicted first under memory pressure; Guaranteed pods survive longer but can still OOMKill if limits are wrong. Set LimitRanges at the namespace level so developers cannot omit requests. ClusterResourceQuota and ResourceQuota prevent a single project from exhausting node capacity — pair them with PriorityClasses for platform-critical workloads.
Vertical Pod Autoscaler and Horizontal Pod Autoscaler operators are available via OperatorHub; enable them deliberately with min/max bounds. Without bounds, a misconfigured VPA recommendation can request eight CPUs for a sidecar. Monitor quota utilization in Prometheus and alert before projects hit hard limits during deploy windows.
Init containers should complete quickly and respect resource requests — long init chains block pod startup and fail readiness deadlines. Sidecars for logging and service mesh inflate resource totals; account for them in namespace quotas. Use PodDisruptionBudget minAvailable tuned to replica count so voluntary disruptions during node drains do not violate availability targets.
Networking, Routes, and Ingress Patterns
OpenShift Routes are first-class ingress objects backed by the HAProxy-based ingress controller. Annotate Routes for TLS termination (edge, passthrough, re-encrypt), timeout values, and wildcard policies. Split public and private ingress controller deployments when DMZ and intranet traffic must not share a controller shard. For gRPC or WebSocket-heavy apps, tune route timeouts and verify backend protocol annotations.
NetworkPolicies should default-deny east-west traffic in sensitive namespaces, then allow explicit label-selected flows. OVN-Kubernetes supports multitenant isolation modes — understand whether your cluster runs in the default single-project network model or stricter SDN policies. Service mesh (Istio via OpenShift Service Mesh operator) adds mTLS and traffic management at the cost of operational complexity; adopt when observability and canary routing justify sidecar overhead.
DNS for services follows <name>.<namespace>.svc.cluster.local. ExternalName services and headless services behave as in upstream Kubernetes — test cross-namespace resolution from a debug pod before declaring connectivity issues closed. Document allowed egress via EgressNetworkPolicy or firewall integrations when workloads call SaaS APIs outside the cluster.
Route shard selection via labels routes traffic to the correct ingress controller — mislabeled Routes land on public shards with corporate certs missing. Use oc describe route to verify admitted status and backend endpoints. For high-connection services, tune HAProxy maxconn annotations and monitor ingress controller pod CPU — it becomes the bottleneck before application pods.
Configuration, Secrets, and Image Pull Governance
Separate config from images: use ConfigMaps for non-sensitive data and Secrets for credentials — recognizing that etcd encryption at rest and RBAC still require Secret objects to be treated as sensitive. Prefer external secret stores (Vault, cloud KMS integrations) synced via operators rather than long-lived Secrets in Git. Sealed Secrets or SOPS-encrypted manifests are acceptable GitOps patterns when external stores are unavailable.
imagePullSecrets bind to ServiceAccounts; automate their creation in CI so pipelines do not embed registry passwords in Jenkinsfiles. Mirror required images to the integrated registry or a trusted corporate mirror to survive upstream registry outages and air-gap requirements. Scan images with Clair or Trivy in CI before promotion — OpenShift admission can enforce policies via Kyverno or OPA Gatekeeper.
Environment-specific overlays belong in Kustomize bases or Helm values files per stage (dev, staging, prod). Never fork entire manifests per environment; drift becomes un-auditable. Use the platform namespace pattern for shared operators and confine application Deployments to tenant projects with RBAC scoped to edit, not admin.
Validate Helm charts against OCP SCC defaults before production — charts requesting runAsUser 0 fail restricted-v2 admission. Use helm template piped to kubeconform or oc apply --dry-run=server in CI. Document required SCC grants in the chart README so security review happens at onboarding, not deploy night.
CI/CD Integration and OpenShift Deployment Best Practices at Scale
Pipelines should build once, promote immutable artifacts, and deploy with the same manifest SHA across stages. OpenShift Pipelines (Tekton) runs natively on cluster; Jenkins agents on OCP are still common in enterprises migrating from VM-based CI. Whichever engine you use, enforce policy gates: unit tests, image scan, signed-image verification, and oc apply --dry-run=server or helm template validation against the target cluster API.
GitOps — covered in depth in our OpenShift GitOps article — is the end state for most platform teams: merge to main triggers reconciliation, drift is visible, rollbacks are git revert. Until GitOps is universal, protect production namespaces with RBAC so only the automation ServiceAccount can apply changes. Human kubectl apply to prod should be break-glass only, logged and ticketed.
These OpenShift deployment best practices compound: reliable rollouts, enforced resources, segmented networking, and pipeline discipline reduce pager noise and make upgrades survivable. Pair this operational baseline with cluster monitoring and upgrade planning so deployments stay healthy across OCP minor version bumps.
Platform SREs should publish golden Deployment templates — probes, resources, labels, topology spread constraints — as internal Helm charts or Kustomize bases. Copy-paste from Stack Overflow produces inconsistent production behavior. Review deployment manifests in PRs with the same rigor as application code; the manifest is part of the system.
Stateful Workloads and OpenShift Deployment Best Practices
StatefulSets on OCP require storage classes with reclaim policies aligned to data retention policy — Delete vs Retain affects whether PVCs survive StatefulSet deletion. Use volumeClaimTemplates with explicit size; expansion may require offline operations depending on CSI driver. Headless Services provide stable network identity for clustered databases — verify DNS SRV records from client pods.
Operators from OperatorHub often manage StatefulSets on your behalf — etcd for custom apps, Kafka, databases. Understand upgrade ordering: operator CSV upgrades may restart pods cluster-wide. Read operator documentation for supported upgrade paths before bumping OCP minor versions.
Backup hooks and OADP schedules should attach to stateful namespaces before go-live. Crash-consistent snapshots without application quiesce risk corrupt backups. Document RPO per StatefulSet tier and test restore into isolated namespaces quarterly.
Topology spread constraints distribute pods across failure domains when node labels are correct — verify region and zone labels on machines before relying on spread in production manifests. Missing topology labels silently collapse spread to single-node concentration.
Windows Containers and Legacy Workload Considerations
Windows node pools require separate machine sets, distinct OS images, and networking validated for hybrid pod communication. Linux-only NetworkPolicies do not govern Windows pods identically — test east-west paths explicitly.
Legacy .NET Framework workloads may need Windows containers before replatforming to Linux .NET Core. SCC and user mappings differ on Windows nodes — involve security review early.
These OpenShift deployment best practices extend to heterogeneous fleets: one GitOps repo, separate overlays per OS, unified monitoring and routes where supported.
Platform engineering golden paths should include Deployment, Route, NetworkPolicy, and ResourceQuota templates per tier — bronze, silver, gold — so teams pick SLA-appropriate defaults instead of inventing manifests per project.
Review deployment frequency and failure rate metrics quarterly — teams with high rollout failure rates usually lack probe tuning, quota headroom, or SCC-compatible images. Fix platform gates before blaming application code.
Explore further
Need help with OpenShift?
Talk to engineers who implement these patterns in production—not generic advisory decks.
