Insights · OpenShift
OpenShift Installation Guide for Production-Ready Clusters
Overview
This OpenShift installation guide is written for platform engineers who must deliver a production-ready cluster on the first attempt — not a lab that collapses under real ingress, registry, and etcd load. Whether you are deploying on bare metal with the Agent-based installer, standing up IPI on vSphere, or evaluating Red Hat OpenShift Service on AWS (ROSA), the same fundamentals apply: correct DNS and load-balancer design, a supported etcd member topology, and Red Hat subscription alignment before you run openshift-install.
OpenShift Container Platform (OCP) is not vanilla Kubernetes with a console bolted on. The installer provisions an integrated control plane — API server, scheduler, controller manager, OpenShift OAuth, internal registry, routes, and the Operator Lifecycle Manager (OLM) — as a single opinionated stack. Skipping prerequisite validation (forward/reverse DNS, NTP, MTU consistency, storage classes) is the most common reason installs succeed on paper but fail during the first application rollout.
The sections below walk through deployment-model selection, infrastructure prerequisites, installer configuration patterns, and the post-install checks we run on every engagement before handing a cluster to application teams. Treat this as a companion to Red Hat documentation: it emphasizes the decisions that are expensive to reverse after day 0.
We also cover disconnected mirror registries, air-gapped constraints, and the validation gates that separate a demo cluster from one that survives first Black Friday traffic. Skipping any prerequisite on this list typically costs a full reinstall or a multi-day outage during the first production push.
Platform engineers should treat this openshift installation guide as a checklist with sign-offs from network, storage, security, and identity teams — not a solo weekend project. Cross-functional gates prevent the classic pattern of a cluster that boots but cannot pass corporate penetration test or storage performance review.
Choosing an OpenShift Installation Model
OpenShift supports three broad installation postures. Installer-provisioned infrastructure (IPI) lets openshift-install create VPCs, machines, and load balancers on supported clouds (AWS, Azure, GCP) or vSphere when you supply adequate permissions. User-provisioned infrastructure (UPI) assumes you provision VMs, DNS records, and load balancers yourself; the installer only lays down the control plane and workers. Agent-based installation targets bare metal and disconnected environments where PXE or cloud APIs are unavailable — agents boot from ISO, discover hardware, and join an assisted-service workflow.
For regulated enterprises in India and the Gulf, UPI and agent-based paths dominate because networking, firewall, and storage teams retain control of every hop. IPI accelerates time-to-cluster in ROSA or public-cloud landing zones where IAM roles and pre-approved VPC templates already exist. The wrong choice is usually obvious: if your security team will not grant the installer cloud-admin rights, IPI is off the table regardless of how fast it looks on a slide.
Managed offerings — ROSA, Azure Red Hat OpenShift (ARO), and OpenShift Dedicated — outsource control-plane lifecycle while you still own worker pools, namespaces, and RBAC. They are not shortcuts around architecture review; you still need private-link design, egress controls, and identity federation identical to self-managed OCP. Map each workload class (stateful, GPU, Windows containers) to a model before procurement, not after the purchase order is signed.
Hybrid topologies mixing on-prem UPI control planes with cloud burst workers multiply operational complexity — document network latency between API server and kubelets, and label machine config pools so OS updates do not drain an entire region simultaneously. Proof-of-concept clusters should mirror production topology even when scaled down; a three-node all-in-one lab will not surface etcd or ingress bottlenecks visible at production scale.
Infrastructure Prerequisites Before openshift-install
DNS is the silent killer of OpenShift installs. You need resolvable records for api.<cluster>.<base_domain>, *.apps.<cluster>.<base_domain>, and each etcd member when running UPI. Forward and reverse lookups must agree; openshift-install and the internal DNS operator assume this during bootstrap. Load balancers must expose 6443 (API), 22623 (machine config server during bootstrap only), and 80/443 for the ingress router — with backend health checks that survive control-plane rolling restarts.
Storage must be decided before install when using IPI on platforms that provision default storage classes. On vSphere, assign a datastore and storage policy; on bare metal, deploy OpenShift Data Foundation (ODF) or a supported CSI driver before stateful workloads land. etcd performance is non-negotiable: use SSD-backed volumes with fsync latency under 10 ms; place etcd members on fault-isolated failure domains (racks, AZs). Red Hat supports three or five etcd members — never two, never six.
Pull secrets, Red Hat subscriptions, and mirror registries belong in the prerequisite checklist for disconnected installs. Use oc mirror to populate a local registry, patch the ClusterVersion and ImageDigestMirrorSet resources, and verify the OperatorHub catalog sources resolve before declaring success. NTP skew beyond a few hundred milliseconds will eventually break TLS and etcd quorum — validate chrony or systemd-timesyncd on every node pre-flight.
Firewall change windows must precede install, not follow it. Document required egress to quay.io, registry.redhat.io, and your mirror endpoints; bootstrap pulls dozens of images before the cluster reports healthy. MTU mismatches between overlay networks and physical switches cause intermittent TLS failures that look like application bugs — validate end-to-end with ping -M do and iperf before scheduling install weekend.
Installer Configuration and OpenShift Installation Guide Patterns
The install-config.yaml is the contract between your architecture and openshift-install. Key fields — compute and control-plane replicas, networking CIDRs, platform credentials, and pullSecret — must match what network and IAM teams approved. For UPI, machine manifests and static pod definitions generated by openshift-install create cluster are applied after bootstrap; keep them in Git with the same rigor as application manifests.
Network mode matters for the entire cluster lifetime. OVN-Kubernetes is the default CNI on modern OCP; configure join/subnet CIDRs that do not overlap with corporate LANs, pod networks on peered clusters, or VPN ranges. Enable network policies on day 0; retro-fitting segmentation after hundreds of namespaces exist is painful. For vSphere UPI, specify the correct platform fields (vCenter, datacenter, datastore, resource pool) — a typo here produces machines in the wrong VLAN with no obvious installer error until kubelet registration fails.
Post-install, the Cluster Version Operator (CVO) owns channel and upgrade graph alignment. Pin to a stable-4.x channel matching your support contract; document the desired initial version in change records. Enable the insights operator for remote health reporting if policy allows — it surfaces certificate expiry and etcd alarm conditions early. Capture the kubeadmin password rotation plan immediately; integrate cluster-admin access with your corporate IdP via OAuth LDAP or OIDC connectors before developers receive kubeconfig files.
Machine config pools separate control-plane, general worker, and infra nodes — define them in install-config when you know ingress, monitoring, and registry components will land on dedicated nodes. Infra nodes carry router and registry pods; starving them on undersized VMs produces ingress latency under moderate RPS. Record install-config.yaml and generated manifests in version control with secrets redacted via sealed patterns or external vault references.
Day-0 Security: SCCs, RBAC, and Registry Defaults
OpenShift ships Security Context Constraints (SCCs) that gate pod security before workloads run. The restricted-v2 SCC is the default target for non-privileged workloads; any chart that demands anyuid or privileged must pass security review. Audit SCC usage with oc adm policy who-can use scc and remediate before production traffic. Platform engineers should create dedicated service accounts per namespace with least-privilege roles — not hand out cluster-admin for convenience.
The internal integrated registry runs on every cluster; configure image pruners and resource quotas so a single team's CI pipeline cannot fill etcd with image layer metadata. Enable image signature enforcement via sigstore or Red Hat Simple Content Access policies when compliance requires provenance. Routes and ingress controllers terminate TLS at the edge — provision corporate CA or Let's Encrypt via cert-manager operators from OperatorHub, and segregate public vs internal ingress controller shards for DMZ patterns.
Audit logging via the cluster-logging operator (or LokiStack on newer releases) should be architected at install time: designate log forwarding endpoints, storage retention, and RBAC for who can read infrastructure logs. FIPS mode, if required, must be enabled during install — toggling later is unsupported. Document every deviation from the reference architecture in a runbook your on-call team can execute without opening Red Hat cases at 2 a.m.
Default OAuth and kubeadmin access should be restricted before any developer onboarding. Create break-glass accounts with MFA, disable kubeadmin after IdP integration is verified, and enable audit logging for authentication failures. SCC defaults and namespace template objects can pre-provision LimitRanges and NetworkPolicy skeletons so new projects inherit secure baselines without manual ticket requests.
Post-Install Validation and Production Handover
A successful install ends with a repeatable validation suite, not a green check on the console. Run openshift-install wait-for install-complete, then verify all cluster operators report Available: oc get clusteroperators. Confirm DNS resolution from a corporate workstation matches in-cluster service discovery. Deploy a sample application with a route, persistent volume, and network policy — if any step fails, fix platform gaps before onboarding product teams.
Backup etcd on day 1 even if disaster recovery is phase two. Schedule automated etcd snapshots to object storage with encryption at rest, and test restore on a non-production lab quarterly. Capture must-gather archives after install for baseline support cases: oc adm must-gather. Label nodes with topology labels (region, zone, rack) that match your failure-domain strategy for pod anti-affinity and storage replication.
Handover documentation should include cluster ID, ingress domains, identity provider configuration, storage class matrix, upgrade channel, and support escalation paths to Red Hat TAM or a qualified partner. Platform engineering teams use this OpenShift installation guide as the first chapter; the next chapters are upgrade planning, GitOps bootstrap, and observability — all of which assume a cluster built to these standards.
Run a controlled failure injection before sign-off: cordon and drain a worker, verify pods reschedule; simulate DNS outage for external dependencies and confirm application degradation matches SLO expectations. Capture baseline Prometheus metrics for API latency, etcd disk fsync, and ingress connection counts — future incidents compare against this golden week-one snapshot.
Disconnected and Air-Gapped OpenShift Installation Guide Notes
Air-gapped installs require a mirror registry populated before bootstrap. Use oc adm release mirror and oc mirror with ImageContentSourcePolicy or ImageDigestMirrorSet objects so nodes pull from internal hostnames only. Catalog sources for OperatorHub must be mirrored separately — platform operators installed on day 2 will fail if their bundle images are unreachable.
Assisted Installer for disconnected bare metal still needs intermittent connectivity or a fully local assisted-service instance. Plan USB or sneakernet transfer sizes for release images — multi-gigabyte payloads are normal. Validate checksums on every transfer; corrupted release payloads produce cryptic bootstrap errors hours into install.
Document the update path for disconnected clusters before go-live: each OCP upgrade requires mirroring new release images and updated operator catalogs. Teams that nail install but neglect mirror automation defer upgrades until they are unsupported. Treat the mirror registry as tier-zero infrastructure with the same backup and HA expectations as etcd.
Configure ImageDigestMirrorSet and ClusterCatalogSource objects together — mismatched mirror paths break operator installs silently until someone subscribes to a failing catalog. Test operator install from mirrored catalog in lab before declaring disconnected install complete.
Explore further
Related services
Related reading
- InsightOpenShift Upgrade Planning
Need help with OpenShift?
Talk to engineers who implement these patterns in production—not generic advisory decks.
