How to Set Up an Ethereum Archive Node [Step-by-Step]
This guide walks you through setting up a Geth archive node, covering everything from Ethereum hardware and network prerequisites to installation, configuration, and ongoing maintenance.
#How to run an Ethereum archive node?
Running an Ethereum archive node means keeping a full, historical record of the blockchain. Whether you're a developer building decentralized applications, a researcher analyzing historical data, or an organization needing full access to the Ethereum network, an archive node gives you the complete picture. Unlike lighter Ethereum nodes that only store recent states, an archive node holds every transaction, smart contract, and block since the network's genesis in 2015.
The catch? Archive nodes are resource-intensive. You're looking at multiple terabytes of storage, significant bandwidth requirements, and hardware that’s capable of continuous read and write operations. Setting one up incorrectly means wasting days on synchronization only to hit bottlenecks or run out of disk space halfway through.
Set up your Ethereum server in minutes
Run your Ethereum nodes on dedicated bare metal — cost efficient, high performance, with full control. Optimized for RPC providers and Web3 developers.
#Understanding Archive Nodes
Ethereum nodes come in different flavors, each serving specific purposes. Before diving into setup, it’s worth understanding what makes archive nodes different and whether you actually need one. At a high level, Ethereum has three main categories of nodes: full, light, and archive.
#1. What is an Ethereum light node?
A light node stores only block headers. It uses these headers to verify transactions and check balances by connecting to full nodes for accurate data. It is lightweight, fast to sync, and perfect for wallets or applications that just need to verify transactions without holding the entire chain.
#2. What is an Ethereum full node?
A full node goes deeper. It stores the current states of the blockchain, validates new blocks, and can serve light clients. However, as the network grows, a full node periodically prunes older state data to save space. That means it can’t recreate exactly what the blockchain looked like at any point in the past.
#3. What is an Ethereum archive node?
That’s where archive nodes come in. They keep everything, every block, every historical state, every smart contract storage slot since the first block was mined. This makes them massive in size but also incredibly powerful. With an archive node, you can query any past account balance, contract variable, or storage state without relying on third-party services.
As of early 2025, a full Ethereum archive node(Geth) requires roughly 14-16 TB of storage and continues growing. The initial synchronization process can take weeks depending on your hardware and network connection.
#Ethereum Archive Node Requirements
Running an archive node requires serious infrastructure. You need fast NVMe storage that doesn't throttle under weeks of constant writes, network links that can pull terabytes without choking, and a setup that stays online without shared tenants hogging resources.
Bare Metal vs Virtual Servers
Most cloud providers offer virtual machines that share physical resources with other customers. This works fine for web applications, but archive nodes max out disk I/O for weeks during synchronization. On virtualized infrastructure, you're competing with other tenants for the same underlying drives, creating unpredictable performance. Bare metal servers give you dedicated hardware with no noisy neighbors and full control over the storage stack.
The performance difference becomes especially apparent during the initial sync. Virtualized environments often experience inconsistent I/O speeds as the hypervisor allocates resources dynamically between tenants. This can turn a three-week sync into a two-month ordeal. With bare metal, you get consistent performance throughout the entire process, and you can optimize the storage configuration specifically for blockchain workloads without virtualization overhead eating into your available resources.
Network and Storage
Look for providers with generous bandwidth allocations on dedicated servers to avoid overage charges during the initial multi-terabyte sync. Network infrastructure should be multi-homed with multiple carriers for reliable peering with Ethereum nodes globally. Server configurations should include multiple enterprise-grade NVMe drives with high endurance ratings.
Pricing
Predictable pricing without bandwidth throttling is essential when syncing terabytes of blockchain data. Hidden fees or throttled connections can significantly impact both setup time and ongoing operational costs.
#Prerequisites
Before you start installing software, make sure you have the right foundation. Missing any of these requirements will either block your progress or force you to start over.
#Ethereum Hardware Requirements
Your Ethereum server needs to handle sustained loads for weeks during initial sync, then maintain that capacity indefinitely. Here’s what you should have:
CPU: At least 8 cores. More cores help during sync, but after that, single-thread performance matters more for block processing.
RAM: 64 GB minimum. The execution client alone can use 16-20 GB during sync. Leave headroom for the operating system and consensus client.
Storage: 16 TB of NVMe SSD space. As of early 2025, the archive data sits around 14-16 TB, but growth is constant. Plan for at least 2 TB of buffer.
Network: 500 Mbps minimum sustained bandwidth. Expect to download several terabytes during initial sync and maintain connections to 50-100 peers afterward.
Set up your Web3 server in minutes
Optimize cost and performance with custom or pre-built dedicated bare metal servers for blockchain workloads. High uptime, instant 24/7 support, pay in crypto.
Client Software Choices
Ethereum now requires two pieces of software running together: an execution client and a consensus client. This dual-client architecture was introduced after The Merge in September 2022.
Popular execution clients include:
- Geth (Go Ethereum)
- Nethermind
- Besu
- Erigon
Popular consensus clients include:
- Lighthouse
- Prysm
- Teku
- Nimbus
For this guide, we'll use Geth for execution and Lighthouse for consensus. Both have proven reliability, extensive documentation, and strong community support. The setup process for other clients follows similar patterns.
#How to Set Up Ethereum Archive Node: Step-by-Step
This section walks through the complete Ethereum archive node setup process, from server provisioning to running a synchronized archive node. We’ll install Geth as the execution client and Lighthouse as the consensus client, configure them properly, and set up monitoring to track progress.
#Step 1: Provision Your Server
Log into your server dashboard and select a bare metal server that meets the archive node requirements. Here’s what to look for:
Recommended Server Configuration:
Look for servers with these minimum specifications:
- CPU: AMD Ryzen or AMD EPYC with at least 8 cores
- RAM: 64 GB or more
- Storage: 2x 8TB NVMe drives (or larger) in a configuration that gives you at least 16TB usable space
- Network: 1 Gbps uplink with unmetered bandwidth
Storage Configuration:
When provisioning, you'll have options for how to configure your drives:
- RAID 0: Combines drives for maximum capacity and speed, but no redundancy. A single drive failure means data loss.
- RAID 1: Mirrors drives for redundancy, but you lose half your capacity.
- No RAID: Use drives independently without any RAID configuration.
- Custom: Allows you to specify a custom RAID level in the configuration comments.
For an archive node, RAID 5 is highly recommended because data integrity and redundancy are paramount. To set this up, select Custom during provisioning and specify "RAID 5" in the custom RAID level comment field. While you can resync from the network after a drive failure, the weeks-long process makes RAID 5's fault tolerance worthwhile with minimal capacity loss.
Operating System:
Select Ubuntu Server 22.04 LTS or 24.04 LTS as your operating system during provisioning. Choose the minimal installation without any pre-installed control panels or extra services.
Note: Most bare metal servers come with basic storage configurations (often 1TB NVMe), but Cherry Servers allows you to customize storage capacity during the provisioning process. Look for options to upgrade or add additional NVMe drives to meet the 16TB requirement. Don’t settle for the default storage if it’s insufficient, archive nodes won’t fit on anything less than what’s specified above.
Post-Provisioning: After provisioning completes (usually within a few hours), you'll receive an email with:
- Server IP address
- Root password
- Any additional network information Connect to your server via SSH:
ssh root@your-server-ip
Enter the root password when prompted. Once logged in, immediately update the system:
apt update && apt upgrade -y
If you have multiple drives, check your storage configuration:
lsblk
df -h
This shows your block devices and mounted filesystems. Note where your NVMe drives are mounted, you'll need this path when configuring Geth and Lighthouse data directories.
#Step 2: Create a Dedicated User
Running blockchain clients as root is a security risk. Create a dedicated user for your Ethereum node:
useradd -m -s /bin/bash ethereum
This creates a user named ethereum with a home directory and bash shell. We’ll run both clients under this user account.
#Step 3: Install Dependencies
Install the required packages:
apt install -y software-properties-common wget curl git build-essential
These packages provide tools needed for downloading, building, and managing the Ethereum clients.
#Step 4: Install Geth (Execution Client)
Since we’re on ubuntu, we’ll use the official Ethereum PPA repository to install Geth. This ensures you get properly maintained packages with easy updates.
Add the Ethereum PPA repository:
sudo add-apt-repository -y ppa:ethereum/ethereum
Update the package list and install Geth:
sudo apt-get update
sudo apt-get install ethereum
This installs the stable version of Geth along with additional tools like clef, devp2p, abigen, bootnode, evm, and rlpdump.
Verify the installation:
geth version
You should see output showing the installed Geth version, along with Go version and architecture details.
#Step 5: Configure Geth Data Directory
Create a directory for Geth’s blockchain data. Based on the storage configuration you checked in step 1, choose an appropriate location on your NVMe drives:
mkdir -p /mnt/ethereum/geth
chown -R ethereum:ethereum /mnt/ethereum
If your NVMe drives are mounted elsewhere (check with lsblk or df -h ), adjust the path accordingly. The important thing is placing the data directory on your high-performance NVMe storage, not on the OS drive.
#Step 6: Create Geth Service File
Create a systemd service file to manage Geth as a background service.
Why is this important? Without a systemd service, you’d have to manually start Geth every time, and if the server reboots or Geth crashes, it won’t restart automatically. Running it directly in a terminal session also means it stops when you disconnect from SSH.
Create the file:
nano /etc/systemd/system/geth.service
Add the following configuration:
[Unit]
Description=Geth Execution Client (archive)
After=network.target
Wants=network.target
[Service]
User=ethereum
Group=ethereum
Type=simple
Restart=always
RestartSec=5
ExecStart=/usr/bin/geth \
--datadir /mnt/ethereum/geth \
--gcmode archive \
--syncmode snap \
--http \
--http.api eth,net,web3,txpool \
--http.addr 127.0.0.1 \
--http.port 8545 \
--authrpc.addr 127.0.0.1 \
--authrpc.port 8551 \
--authrpc.vhosts localhost \
--authrpc.jwtsecret /mnt/ethereum/jwt.hex \
--maxpeers 50
[Install]
WantedBy=multi-user.target
Key flags explained:
--gcmode archive: Keeps all historical state data instead of pruning--syncmode snap: Uses snap sync for faster initial blockchain download--http: Enables the HTTP-RPC server for queries--http.api: Specifies which API namespaces to expose--authrpc.*: Configuration for authenticated communication with the consensus client--maxpeers: Limits peer connections to manage bandwidth Save and exit (Ctrl+X, thenY, thenEnter).
#Step 7: Generate JWT Secret
Both Geth and Lighthouse need a shared JWT (JSON Web Token) secret for secure communication. This token authenticates the connection between the execution and consensus layers.
Generate the JWT secret:
openssl rand -hex 32 | tr -d "\n" > /mnt/ethereum/jwt.hex
Set the correct permissions:
chown ethereum:ethereum /mnt/ethereum/jwt.hex
chmod 600 /mnt/ethereum/jwt.hex
The chmod 600 ensures only the ethereum user can read the file, which is important for security since this token authorizes communication between your clients.
#Step 8: Install Lighthouse (Consensus Client)
Lighthouse doesn't have a PPA repository like Geth, so we'll download the pre-compiled binary from the official GitHub releases.
First, navigate to a temporary directory:
cd /tmp
Download the latest Lighthouse release. As of writing, the latest version is v8.0.0, but you should check the Lighthouse releases page for the most recent stable version:
wget https://github.com/sigp/lighthouse/releases/download/v8.0.0/lighthouse-v8.0.0-x86_64-unknown-linux-gnu.tar.gz
Extract the archive:
tar -xzf lighthouse-v8.0.0-x86_64-unknown-linux-gnu.tar.gz
Move the binary to a system directory and make it executable:
mv lighthouse /usr/local/bin/
chmod +x /usr/local/bin/lighthouse
Verify the installation:
lighthouse --version
You should see the Lighthouse version information displayed.
#Step 9: Configure Lighthouse Data Directory
Create a directory for Lighthouse's beacon chain data:
mkdir -p /mnt/ethereum/lighthouse
chown -R ethereum:ethereum /mnt/ethereum/lighthouse
Like with Geth, ensure this directory is on your NVMe storage for optimal performance.
#Step 10: Create Lighthouse Service File
Create a systemd service file for Lighthouse:
nano /etc/systemd/system/lighthouse.service
Add the following configuration:
[Unit]
Description=Lighthouse Consensus Client
After=network.target geth.service
Wants=network.target
Requires=geth.service
[Service]
User=ethereum
Group=ethereum
Type=simple
Restart=always
RestartSec=5
ExecStart=/usr/local/bin/lighthouse bn \
--network mainnet \
--datadir /mnt/ethereum/lighthouse \
--http \
--execution-endpoint http://127.0.0.1:8551 \
--execution-jwt /mnt/ethereum/jwt.hex \
--checkpoint-sync-url https://mainnet.checkpoint.sigp.io
[Install]
WantedBy=multi-user.target
Key flags explained:
bn: Runs Lighthouse in beacon node mode--network mainnet: Connects to Ethereum mainnet--execution-endpoint: Points to Geth's authenticated RPC port--execution-jwt: Uses the JWT secret we created for authentication--checkpoint-sync-url: Enables checkpoint sync, allowing Lighthouse to sync from a recent finalized checkpoint instead of from genesis. This dramatically reduces sync time from weeks to hours.
Save and exit (Ctrl+X, then Y, then Enter).
#Step 11: Start the Services
Now that both clients are configured, reload systemd to recognize the new service files:
systemctl daemon-reload
Enable both services to start automatically on boot:
systemctl enable geth lighthouse
Start Geth first, then Lighthouse:
systemctl start geth
systemctl start lighthouse
Check that both services are running:
systemctl status geth
systemctl status lighthouse
You should see active (running) in green for both services. If either shows an error, check the logs to diagnose the issue (we'll cover monitoring in the next step).
#Step 12: Monitor Sync Progress
Once both services are running, you'll want to monitor their sync progress. The initial sync takes time, and it's helpful to confirm everything is working correctly.
View Live Logs
To see real-time logs from Geth:
journalctl -fu geth
To see real-time logs from Lighthouse:
journalctl -fu lighthouse
Press Ctrl+C to exit the log view.
Check Geth Sync Status
Attach to the Geth console:
geth attach /mnt/ethereum/geth/geth.ipc
Inside the console, check sync status:
eth.syncing
If Geth is still syncing, this returns an object showing:
currentBlock: The block number Geth has synced tohighestBlock: The latest block on the network If fully synced, it returnsfalse. To exit the Geth console, typeexit.
Check Lighthouse Sync Status
Lighthouse provides an HTTP API for checking sync status:
curl http://localhost:5052/eth/v1/node/syncing
This returns JSON showing whether the beacon node is syncing and its progress.
What to Expect During Sync
- Lighthouse will typically finish syncing first, usually within 1-3 days thanks to checkpoint sync. You'll see logs indicating slot processing and finalized checkpoints.
- Geth will take significantly longer, potentially 2-4 weeks or more. Early blocks sync quickly, but as you reach blocks from 2020 onwards (DeFi summer era), processing slows down due to increased transaction complexity.
- During sync, expect high disk I/O, fluctuating CPU usage, and constant network activity. This is normal behavior.
Resource Monitoring
Keep an eye on system resources:
htop
And disk usage:
df -h
Make sure you're not running out of disk space during the sync process.
#Step 13: Configure Firewall
Your node needs to communicate with other Ethereum nodes across the network. By default, most servers have firewall rules that block incoming connections. We need to open the necessary ports for peer-to-peer communication.
Install UFW (if not already installed)
Ubuntu typically comes with UFW (Uncomplicated Firewall). Check if it's installed:
ufw status
If UFW isn't installed:
apt install ufw
Configure Firewall Rules
Allow SSH (so you don't lock yourself out):
ufw allow 22/tcp
Allow Geth's P2P port (default is 30303 for both TCP and UDP):
ufw allow 30303/tcp
ufw allow 30303/udp
Allow Lighthouse's P2P port (default is 9000 for both TCP and UDP):
ufw allow 9000/tcp
ufw allow 9000/udp
Enable the firewall:
ufw enable
Verify the rules:
ufw status
You should see the ports listed as allowed.
Note on RPC Ports
The HTTP-RPC port (8545) and authenticated RPC port (8551) are bound to 127.0.0.1 (localhost only) in our configuration, so they're not accessible from outside the server. This is intentional for security. If you need to access these APIs remotely, set up an SSH tunnel or reverse proxy instead of exposing them directly to the internet.
#Optimization and Maintenance Best Practices
Running an archive node isn't a set-it-and-forget-it operation. Regular maintenance and performance optimization, ensure your node stays reliable over time.
#Performance Optimization
Disk I/O Monitoring archive nodes are heavily disk-bound. Monitor I/O performance regularly:
iostat -x 5
This shows disk utilization every 5 seconds. If you consistently see 100% utilization, your storage might be the bottleneck.
Database Compaction
Over time, Geth's database can become fragmented. If your node has been running for months, consider running offline compaction to reclaim space and improve performance:
systemctl stop geth
geth snapshot prune-state --datadir /mnt/ethereum/geth
systemctl start geth
This process can take several hours but significantly improves query performance.
Connection Management
If you're experiencing slow sync or poor peer connectivity, adjust the max peers setting. Edit your Geth service file and increase --maxpeers:
nano /etc/systemd/system/geth.service
Change --maxpeers 50 to --maxpeers 100, then reload and restart:
systemctl daemon-reload
systemctl restart geth
More peers generally means better sync performance, but it also increases bandwidth and memory usage.
#Maintenance Tasks
Regular Updates
Keep both clients updated. New releases include bug fixes, performance improvements, and security patches.
For Geth:
systemctl stop geth
apt update
apt upgrade geth
systemctl start geth
For Lighthouse, download the latest binary and replace the existing one:
systemctl stop lighthouse
cd /tmp
wget https://github.com/sigp/lighthouse/releases/download/vX.X.X/lighthouse-vX.X.X-x86_64-unknown-linux-gnu.tar.gz
tar -xzf lighthouse-vX.X.X-x86_64-unknown-linux-gnu.tar.gz
mv lighthouse /usr/local/bin/
chmod +x /usr/local/bin/lighthouse
systemctl start lighthouse
#How Difficult Is It to Maintain an Ethereum archive Node?
Maintaining an archive node requires ongoing attention but isn't overwhelming once the initial sync completes. You'll need to keep both clients updated with new releases, monitor disk space as the blockchain grows (expect roughly 50-100 GB of growth per month), and occasionally check for performance degradation. Most maintenance tasks are straightforward and can be scheduled monthly, though you should monitor disk usage more frequently to avoid running out of space mid-operation.
#Conclusion
Setting up an Ethereum archive node is a significant technical undertaking, but the payoff is complete autonomy over your blockchain data. Once synced, you'll have unrestricted access to query any historical state without depending on third-party providers or hitting rate limits.
High egress costs and lost transactions?
Switch to blockchain-optimized dedicated bare metal—save up to 60% on your cloud bill and double the performance compared to hyperscale cloud.
We accept Bitcoin and other popular cryptocurrencies.



