RKE2 High Availability Architecture: A Blueprint for On-Prem Resilience

Onur Rıdvanoğlu
Aug 7, 2025
6 min read

Updated: Aug 20, 2025

Hey everyone, and welcome back to DevOps Footprints! I've been away for a while. I've been trying to work out some work stuff, and some new changes are on the way :) This means more use-cases, more new technologies to work with. Can't wait to share those all with you.

For those of us operating in the on-prem world, building infrastructure that is both powerful and resilient is the name of the game.

Today, I want to walk you through the architectural blueprint for a highly available Kubernetes cluster I recently designed using RKE2, with a special twist for maximum uptime.

This post will focus on the why and the what of the setup. We'll explore the components, the traffic flow, and the reasoning behind the design. We’ll save the nuts-and-bolts installation guide for a follow-up post.

Why RKE2 for an On-Prem Cluster?

When choosing a Kubernetes distribution, RKE2 (from Rancher/SUSE) stands out for on-premise deployments. It’s designed from the ground up for security and compliance, earning it the nickname "RKE Government". For us, the key benefits are its simplicity and robustness. RKE2 bundles Kubernetes components into a single binary, which drastically simplifies management and reduces the potential for configuration drift. That's why I always install any Kubernetes cluster using RKE2.

Most importantly, it has a straightforward model for creating a High-Availability (HA) cluster, which is non-negotiable for any serious production environment. An HA control plane ensures that the failure of a single master node doesn't bring your entire cluster to a stop.

The Challenge: True High Availability

The standard approach to an HA Kubernetes cluster involves placing a load balancer in front of your server (master) nodes. This provides a "fixed registration address" so that worker nodes and administrators have a stable endpoint to communicate with the Kubernetes API.

But this raises a critical question: what if that load balancer fails? You've just moved your single point of failure from a master node to the load balancer. To build a truly resilient system, we need to eliminate this single point of failure.

Our RKE2 High Availability Architecture Blueprint

To solve this problem, I designed a multi-layered architecture that adds a secondary load-balancing tier. Instead of relying on a single device, this setup distributes the load-balancing responsibility, ensuring there's no single weak link in the chain.

The core components of this architecture are:

1 x Main Load Balancer: This is the primary entry point. It could be a hardware appliance (like an F5 or NetScaler) or a software-based solution that depends on the infrastructure you're working on. Its sole purpose is to perform a simple Layer 4 TCP pass-through to the next layer in the chain.

2 x HAProxy Servers: This is our active-active secondary load balancing layer. These servers receive traffic from the main load balancer and perform the "intelligent" work. They are responsible for running health checks against the RKE2 master nodes and ensuring that traffic is only sent to healthy, responsive members of the control plane.

3 x RKE2 Server (Master) Nodes: This is the heart of the Kubernetes control plane. Running three servers is the minimum for an HA cluster, as the underlying etcd datastore requires an odd number of members to safely establish quorum and maintain the cluster's state. I'll explain the idea behind selecting the right number of master and worker nodes in the next section.

N x RKE2 Agent (Worker) Nodes: These are the workhorses of our cluster. They connect to the control plane via the load-balanced address and are responsible for running our containerized applications.

Sizing Your Cluster: How Many Nodes and Why?

A common question when designing a cluster is, "How many master and worker nodes do I need?" The answer is rooted in balancing availability, performance, and cost.

Sizing the Control Plane: Why Three (or Five) Master Nodes?

The number of master nodes is dictated by the needs of etcd, the key-value store that acts as the cluster's brain. It holds all the configuration and state data for your entire Kubernetes environment.

To prevent data corruption in a distributed system (a "split-brain" scenario), etcd requires a quorum, a majority of nodes, to be operational to make any changes. The formula to determine quorum is (n/2) + 1, where n is the number of master nodes.

1 Node: No high availability. If this node fails, the cluster is gone.

2 Nodes: Not recommended. Quorum is (2/2) + 1 = 2. If one node fails, the single remaining node cannot form a majority. The cluster becomes read-only and cannot make changes. A 2-node setup has no better fault tolerance than a 1-node setup.

3 Nodes: The ideal starting point for HA. Quorum is (3/2) + 1 = 2. This means the cluster can tolerate the loss of one master node and still function perfectly.

5 Nodes: For mission-critical workloads. Quorum is (5/2) + 1 = 3. This setup can tolerate the loss of two master nodes, providing an even higher degree of resilience.

Sizing the Workforce: How Many Worker Nodes?

The number of worker (agent) nodes is much more flexible and is driven entirely by your application workloads. There is no quorum requirement here. Instead, you should consider:

Resource Needs: How much total CPU, memory, and storage do your applications require? Sum up the needs of all the services you plan to run and provision nodes to meet that demand.

Redundancy: You need a minimum of two worker nodes to have any form of application-level high availability. If one worker node goes down for maintenance or fails, Kubernetes can reschedule your application pods to the other healthy node. For production, three or more is even better to handle rolling updates without any capacity constraints.

Scalability: Start with what you need now, but keep future growth in mind. One of the beauties of Kubernetes is the ease with which you can add more worker nodes to the cluster as your needs expand.

Architectural diagram of a resilient on-prem Kubernetes setup showing a top-down flow

Tracing the Flow of a Request

To understand the resilience of this RKE2 high availability architecture, let's follow a typical request.

An administrator runs a kubectl command, or a new worker node tries to join the cluster. The request is sent to the stable address of the Main Load Balancer.
The Main Load Balancer, configured for simple round-robin or least-connection balancing, forwards the TCP packet to one of the two HAProxy Servers. If one HAProxy server is down, the main load balancer will naturally route traffic to the healthy one.
The receiving HAProxy server inspects the request. It knows which RKE2 Server Nodes are currently healthy because it's constantly running health checks. It forwards the request to an available server node on the appropriate port (typically 6443 for the API or 9345 for initial registration).
The RKE2 Server Node processes the request, and the response flows back through the same path.

This layered approach means that the failure of a single RKE2 Server (Master) Node or a single HAProxy instance is handled gracefully, with no interruption to the cluster's availability.

Key Benefits of This Architectural Design

Eliminates Single Points of Failure: By using two HAProxy servers behind a primary load balancer, we remove the load-balancing layer itself as a single point of failure. This is a significant step up in resilience compared to standard HA guides.
Enhanced Control Plane Stability: The health checks managed by HAProxy ensure that traffic is never routed to a master node that is down for maintenance, experiencing a failure, or is otherwise unresponsive. This protects the integrity of the control plane.
Scalability and Maintainability: This architecture is clean and scalable. Adding more worker nodes is trivial, and the control plane itself is managed by the robust, self-healing properties of RKE2 and our redundant proxy layer.

What's Next?

I hope this overview has shed some light on the "why" behind this highly available on-prem architecture. By thinking in layers and planning for failure at every step, we can build systems that are truly production-grade.

In my next post, I'll dive into the technical details. We'll roll up our sleeves and walk through the complete installation process, including the HAProxy configuration, the RKE2 setup on each node, and how to verify that everything is working as expected.

Stay tuned, and happy building!!