How to Set Up Your Own RPC Cluster [Step-by-Step]
An RPC cluster takes what a single node can do and scales it across multiple machines behind a load balancer, giving our applications a resilient, high-throughput gateway to a blockchain. Instead of one node becoming the bottleneck when traffic spikes, we distribute requests across a pool of synchronized nodes.
In this guide, I’ll walk you through everything needed to set up your own RPC cluster. By the end of this tutorial, you'll be able to operate a production-grade RPC cluster.
Deploy a Solana RPC node in minutes
Dedicated server configuration optimized for Solana RPC workloads.
#What is an RPC Cluster?
An RPC cluster is a group of synchronized blockchain nodes sitting behind a load balancer that routes incoming JSON-RPC requests across them. Each backend node is a full dedicated RPC node running the same client software, synced to the same chain head. The load balancer, usually NGINX or HAProxy, acts as a single entry point and decides which backend handles each request based on health checks and current load.
The value comes down to three things: redundancy, throughput, and maintainability. If one node crashes or falls behind the chain tip, the load balancer stops sending it traffic and our application keeps running. When request volume spikes, we’re spreading the load rather than overwhelming a single machine. And when it's time to upgrade a node, we can take it out of rotation without user-facing downtime.
This architecture is chain-agnostic. The same principles apply whether we're running Ethereum (Geth, Erigon, Besu), Polygon (Bor), or any other EVM-compatible RPC-serving client.
#RPC Cluster vs. Single Node
It’s worth knowing when a cluster is actually worth the operational overhead.
A single RPC node works fine for development, low-traffic internal tools, and hobbyist projects. A cluster becomes the right choice when uptime matters, when your application serves real users, or when you’re routinely hitting the request limits of a single node. DeFi frontends, trading bots at scale, analytics platforms, and any service with paying customers typically need one.
If our applications meaningfully suffer from a 30-minute outage while we reboot or resync a node, that’s the signal we’ve outgrown the single-node setup.
#RPC Cluster Requirements
Before we start provisioning servers, let’s map out what the full cluster needs. The exact specs depend on which chain we’re running, but the shape of the infrastructure is consistent.
#Cluster Hardware Requirements
At minimum, a production RPC cluster needs:
-
2–3 Backend RPC Nodes
- CPU: 8–16 cores (chain-dependent)
- RAM: 32–64 GB
- Storage: NVMe SSD sized for your chain (Ethereum mainnet needs 4–8 TB NVMe, Polygon needs 6–8 TB)
- Network: 1 Gbit/s symmetric
-
1–2 Load Balancer Instances
- CPU: 4–8 cores
- RAM: 8–16 GB
- Storage: 100 GB SSD
- Network: 1 Gbit/s symmetric
Dedicated bare metal is almost always the right call for the backend nodes. Blockchain clients are punishing on disk I/O, and shared virtualized environments introduce latency that slows sync and query response times. For load balancers, virtualized instances are usually fine, though a second instance running in active-passive mode is essential to avoid turning the balancer itself into a single point of failure.
For this tutorial, we used Cherry Servers' AMD Ryzen 7700X dedicated bare metal configuration as our backend RPC node baseline. With 16 threads at up to 5.4 GHz, 64 GB RAM, and 2x 1TB NVMe drives, it hits a solid baseline for Ethereum pruned-node RPC workloads. For heavier chains like Polygon, archive setups, or higher-throughput clusters, the AMD EPYC 9375F steps up to 32 cores, 384 GB RAM, and 10 TB of NVMe storage across separate drives.
#Cluster Software Requirements
- Ubuntu 22.04 LTS or newer on every machine
- SSH access with a sudo user
- The RPC client for your target chain (Geth, Erigon, Bor, etc.)
- NGINX or HAProxy for load balancing
- Prometheus and Grafana for monitoring (strongly recommended)
#Ports to Open
The port layout is straightforward. Backend nodes expose their RPC ports *only* to the load balancer, not to the public internet.
Backend Nodes
| Port | Protocol | Purpose |
|---|---|---|
| 8545 | TCP | JSON-RPC HTTP (to load balancer only) |
| 8546 | TCP | JSON-RPC WebSocket (to load balancer only) |
| 30303 | TCP/UDP | P2P (public) |
Load Balancer
| Port | Protocol | Purpose |
|---|---|---|
| 443 | TCP | HTTPS (public) |
| 80 | TCP | HTTP (public, for redirects) |
| 9090 | TCP | Prometheus metrics (internal) |
Pro Tip: Use private networking between the load balancer and backend nodes whenever your provider supports it. This keeps RPC traffic off the public internet, reduces bandwidth costs, and minimizes your attack surface.
#How to Set Up an RPC Cluster
We'll build a three-node Ethereum RPC cluster with NGINX as the load balancer. The exact same pattern works for any chain, just swap the client and adjust the sync requirements.
#Step 1: Provision and Sync the Backend Nodes
Start by provisioning your backend nodes and installing the RPC client for your chain. We won't cover full node setup here since we have dedicated guides for Ethereum and Polygon.
One thing is critical at this stage: every backend must run the same client software, the same version, and be fully synced to the chain tip before we put it behind the load balancer. Mixing versions or adding unsynced nodes causes inconsistent responses, which is worse than having fewer nodes.
One thing specific to Ethereum: each backend also needs an attached consensus client (Lighthouse, Prysm, Teku, Nimbus, Lodestar, or Grandine) connected to Geth via the authenticated Engine API on port 8551. The consensus client is required post-Merge, without it, the execution client won't sync. Our Ethereum node guide covers the full execution and consensus client setup if you need a refresher.
#Step 2: Configure the Backend Nodes for Internal Access
On each backend node, bind the RPC endpoints to the internal network interface rather than localhost or 0.0.0.0. For Geth, this means adjusting the startup flags:
geth --http --http.addr 10.0.0.10 --http.port 8545 \
--http.api "eth,net,web3,txpool" \
--ws --ws.addr 10.0.0.10 --ws.port 8546 \
--ws.api "eth,net,web3,txpool"
Replace 10.0.0.10 with each node's private network IP. Make sure the firewall on each backend only accepts connections on ports 8545 and 8546 from the load balancer's IP address.
sudo ufw allow from <load-balancer-ip> to any port 8545 proto tcp
sudo ufw allow from <load-balancer-ip> to any port 8546 proto tcp
sudo ufw allow 30303/tcp
sudo ufw allow 30303/udp
sudo ufw enable
If your application requires transaction tracing—for example, dApp debuggers or analytics tools that use debug_traceTransaction or trace_call—you can add the debug and trace namespaces to the API list.
These methods are significantly more resource-intensive than standard RPC calls, so we recommend exposing them only to internal services or protecting them with per-method rate limiting at the load balancer.
Avoid exposing tracing methods on public-facing endpoints without strict access controls.
#Step 3: Install and Configure NGINX
Now we'll write the NGINX configuration. Two files need editing:
- The main
/etc/nginx/nginx.conffile for the WebSocket upgrade map. - A new site configuration for the RPC cluster itself.
First, add the WebSocket upgrade map and upstream selector map inside the http block of /etc/nginx/nginx.conf:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
map $http_upgrade $rpc_upstream {
default rpc_ws_backends;
'' rpc_http_backends;
}
This allows NGINX to handle connection headers and dynamically route traffic to the correct backend pool.
Now create the cluster site configuration:
sudo nano /etc/nginx/sites-available/rpc-cluster
Paste the following configuration. It uses a single location / block to handle both standard JSON-RPC and WebSocket subscriptions at the root path (https://rpc.yourdomain.com):
upstream rpc_http_backends {
least_conn;
server 10.0.0.10:8545 max_fails=2 fail_timeout=30s;
server 10.0.0.11:8545 max_fails=2 fail_timeout=30s;
server 10.0.0.12:8545 max_fails=2 fail_timeout=30s;
keepalive 32;
}
upstream rpc_ws_backends {
least_conn;
server 10.0.0.10:8546 max_fails=2 fail_timeout=30s;
server 10.0.0.11:8546 max_fails=2 fail_timeout=30s;
server 10.0.0.12:8546 max_fails=2 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name rpc.yourdomain.com;
ssl_certificate /etc/ssl/certs/rpc.crt;
ssl_certificate_key /etc/ssl/private/rpc.key;
location / {
proxy_pass http://$rpc_upstream;
proxy_http_version 1.1;
# Headers for WebSocket support
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Standard proxy headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Timeouts: 3600s ensures long-lived WebSocket subscriptions aren't dropped
proxy_connect_timeout 5s;
proxy_read_timeout 3600s;
}
}
The $rpc_upstream variable automatically routes traffic to the HTTP pool (port 8545) for standard requests and the WebSocket pool (port 8546) when an Upgrade header is detected. This allows clients to use both:
https://rpc.yourdomain.comwss://rpc.yourdomain.com
without requiring a separate /ws endpoint.
Enable the configuration and reload NGINX:
sudo ln -s /etc/nginx/sites-available/rpc-cluster /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
#Step 4: Implement Health Checks
The configuration above handles basic connection-level failures, but it won't detect a node that is still responding to TCP requests while lagging behind the chain tip.
The cleanest approach is to use an external health-check script that queries eth_blockNumber on each backend and compares it to the network head. If a node falls more than a few blocks behind, the script can temporarily remove it from the NGINX upstream pool.
Run this script every 15–30 seconds using a cron job or systemd timer.
For teams using NGINX Plus or HAProxy, active health checking is available out of the box. With open-source NGINX, a lightweight shell script combined with nginx -s reload is typically sufficient.
#Step 5: Add TLS and Rate Limiting
Exposing an RPC cluster over plain HTTP is not recommended. Use Let's Encrypt to provision free TLS certificates:
sudo apt-get install -y certbot python3-certbot-nginx
sudo certbot --nginx -d rpc.yourdomain.com
For rate limiting, add the following directive inside the http block of /etc/nginx/nginx.conf:
limit_req_zone $binary_remote_addr zone=rpc_limit:10m rate=100r/s;
Then add the following inside the location block of the RPC cluster configuration:
limit_req zone=rpc_limit burst=200 nodelay;
This limits each client IP to 100 requests per second with a burst allowance of 200 requests. Adjust these values to match your workload and traffic profile.
#Step 6: Set Up Monitoring
Monitoring is essential for a production RPC cluster.
You need visibility into:
- Node synchronization status
- Load balancer health
- Request latency
- Error rates
- Resource utilization
A common monitoring stack consists of:
- Prometheus for metrics collection
- Grafana for visualization
- NGINX status metrics
- Node-level metrics from each backend
At a minimum, monitor:
- Sync status per node
- Requests per second
- Average latency
- P95 latency
- Error rate per backend
- Load balancer CPU usage
- Load balancer memory usage
#Step 7: Test the Cluster
Once the cluster is configured, verify that requests are successfully reaching the backends.
Run a simple eth_blockNumber request against the cluster endpoint:
curl -s https://rpc.yourdomain.com \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
Expected response:
{"jsonrpc":"2.0","id":1,"result":"0x15cd1a2"}
To verify load balancing behavior, tail the NGINX access logs while running a load test using a tool such as wrk or ab.
Requests should be distributed across all backend nodes according to their available capacity.
#Scaling and Maintenance
Scaling is straightforward once the foundation is in place. Add another backend, sync it to the chain tip, add its IP to the upstream block, and reload NGINX. No downtime, no config migrations.
For maintenance, the same pattern works in reverse. To upgrade a node, drain it by removing it from the upstream pool, wait for existing connections to close, perform the upgrade, verify it’s synced and healthy, then add it back. This rolling-upgrade pattern is one of the main reasons we built the cluster in the first place.
As traffic grows, you’ll eventually want geographic distribution, a second load balancer in active-passive failover with Keepalived, and possibly a caching layer for frequently-requested static data. The basic cluster we’ve built here handles most workloads comfortably up to several thousand requests per second.
Paying too much for cloud infrastructure?
Switch to blockchain-optimized dedicated bare metal—save up to 60% on your cloud bill and double the performance compared to hyperscale cloud.
#Conclusion
We now have a working RPC cluster with three backend nodes behind NGINX, TLS terminated at the edge, rate limiting in place, and health checks keeping unhealthy nodes out of rotation. For most applications, this is where RPC infrastructure stops needing constant attention and starts quietly doing its job.
Keep all backend nodes on matching client versions, monitor sync status continuously, and treat the load balancer config as code. With this setup, our applications get the reliability and throughput that public endpoints can’t match, on infrastructure we fully control.
FAQs
How many nodes do I need in my RPC cluster?
For production, start with three. Two gives you redundancy but no room to take a node out for maintenance without losing it. Three is the minimum that lets you upgrade one node at a time while keeping two healthy.
NGINX or HAProxy for the load balancer?
Either works. NGINX is simpler to configure and has great community resources. HAProxy offers more advanced health checks and slightly better raw performance under heavy load. For most RPC workloads, the difference is negligible.
Should we mix clients (e.g., Geth and Erigon)?
Only if you’re comfortable handling the small response format differences between clients. For most teams, running identical clients is simpler and less error-prone.
Do we need a second load balancer?
For serious production, yes. A single load balancer is a single point of failure. Use Keepalived to run an active-passive pair with a floating virtual IP.
Can we add public cloud nodes as fallbacks?
Yes, adding an Alchemy or Infura endpoint as a backup upstream is a common pattern. Weight it lower than your owned nodes so traffic only spills over during outages.
Deploy secure and high-performance nodes on dedicated infrastructure.