How to Set Up an Ethereum Archive Node [Step-by-Step]

How to Set Up an Ethereum Archive Node [Step-by-Step]
Published on Dec 22, 2025 Updated on Dec 23, 2025

This guide walks you through setting up a Geth archive node, covering everything from Ethereum hardware and network prerequisites to installation, configuration, and ongoing maintenance.

#How to run an Ethereum archive node?

Running an Ethereum archive node means keeping a full, historical record of the blockchain. Whether you're a developer building decentralized applications, a researcher analyzing historical data, or an organization needing full access to the Ethereum network, an archive node gives you the complete picture. Unlike lighter Ethereum nodes that only store recent states, an archive node holds every transaction, smart contract, and block since the network's genesis in 2015.

The catch? Archive nodes are resource-intensive. You're looking at multiple terabytes of storage, significant bandwidth requirements, and hardware that’s capable of continuous read and write operations. Setting one up incorrectly means wasting days on synchronization only to hit bottlenecks or run out of disk space halfway through.

Set up your Ethereum server in minutes

Run your Ethereum nodes on dedicated bare metal — cost efficient, high performance, with full control. Optimized for RPC providers and Web3 developers.

#Understanding Archive Nodes

Ethereum nodes come in different flavors, each serving specific purposes. Before diving into setup, it’s worth understanding what makes archive nodes different and whether you actually need one. At a high level, Ethereum has three main categories of nodes: full, light, and archive.

#1. What is an Ethereum light node?

A light node stores only block headers. It uses these headers to verify transactions and check balances by connecting to full nodes for accurate data. It is lightweight, fast to sync, and perfect for wallets or applications that just need to verify transactions without holding the entire chain.

#2. What is an Ethereum full node?

A full node goes deeper. It stores the current states of the blockchain, validates new blocks, and can serve light clients. However, as the network grows, a full node periodically prunes older state data to save space. That means it can’t recreate exactly what the blockchain looked like at any point in the past.

#3. What is an Ethereum archive node?

That’s where archive nodes come in. They keep everything, every block, every historical state, every smart contract storage slot since the first block was mined. This makes them massive in size but also incredibly powerful. With an archive node, you can query any past account balance, contract variable, or storage state without relying on third-party services.

As of early 2025, a full Ethereum archive node(Geth) requires roughly 14-16 TB of storage and continues growing. The initial synchronization process can take weeks depending on your hardware and network connection.

#Ethereum Archive Node Requirements

Running an archive node requires serious infrastructure. You need fast NVMe storage that doesn't throttle under weeks of constant writes, network links that can pull terabytes without choking, and a setup that stays online without shared tenants hogging resources.

Bare Metal vs Virtual Servers

Most cloud providers offer virtual machines that share physical resources with other customers. This works fine for web applications, but archive nodes max out disk I/O for weeks during synchronization. On virtualized infrastructure, you're competing with other tenants for the same underlying drives, creating unpredictable performance. Bare metal servers give you dedicated hardware with no noisy neighbors and full control over the storage stack.

The performance difference becomes especially apparent during the initial sync. Virtualized environments often experience inconsistent I/O speeds as the hypervisor allocates resources dynamically between tenants. This can turn a three-week sync into a two-month ordeal. With bare metal, you get consistent performance throughout the entire process, and you can optimize the storage configuration specifically for blockchain workloads without virtualization overhead eating into your available resources.

Network and Storage

Look for providers with generous bandwidth allocations on dedicated servers to avoid overage charges during the initial multi-terabyte sync. Network infrastructure should be multi-homed with multiple carriers for reliable peering with Ethereum nodes globally. Server configurations should include multiple enterprise-grade NVMe drives with high endurance ratings.

Pricing

Predictable pricing without bandwidth throttling is essential when syncing terabytes of blockchain data. Hidden fees or throttled connections can significantly impact both setup time and ongoing operational costs.

#Prerequisites

Before you start installing software, make sure you have the right foundation. Missing any of these requirements will either block your progress or force you to start over.

#Ethereum Hardware Requirements

Your Ethereum server needs to handle sustained loads for weeks during initial sync, then maintain that capacity indefinitely. Here’s what you should have:

CPU: At least 8 cores. More cores help during sync, but after that, single-thread performance matters more for block processing.

RAM: 64 GB minimum. The execution client alone can use 16-20 GB during sync. Leave headroom for the operating system and consensus client.

Storage: 16 TB of NVMe SSD space. As of early 2025, the archive data sits around 14-16 TB, but growth is constant. Plan for at least 2 TB of buffer.

Network: 500 Mbps minimum sustained bandwidth. Expect to download several terabytes during initial sync and maintain connections to 50-100 peers afterward.

Set up your Web3 server in minutes

Optimize cost and performance with custom or pre-built dedicated bare metal servers for blockchain workloads. High uptime, instant 24/7 support, pay in crypto.

Client Software Choices

Ethereum now requires two pieces of software running together: an execution client and a consensus client. This dual-client architecture was introduced after The Merge in September 2022.

Popular execution clients include:

  • Geth (Go Ethereum)
  • Nethermind
  • Besu
  • Erigon

Popular consensus clients include:

  • Lighthouse
  • Prysm
  • Teku
  • Nimbus

For this guide, we'll use Geth for execution and Lighthouse for consensus. Both have proven reliability, extensive documentation, and strong community support. The setup process for other clients follows similar patterns.

#How to Set Up Ethereum Archive Node: Step-by-Step

This section walks through the complete Ethereum archive node setup process, from server provisioning to running a synchronized archive node. We’ll install Geth as the execution client and Lighthouse as the consensus client, configure them properly, and set up monitoring to track progress.

#Step 1: Provision Your Server

Log into your server dashboard and select a bare metal server that meets the archive node requirements. Here’s what to look for:

Recommended Server Configuration:

Look for servers with these minimum specifications:

  • CPU: AMD Ryzen or AMD EPYC with at least 8 cores
  • RAM: 64 GB or more
  • Storage: 2x 8TB NVMe drives (or larger) in a configuration that gives you at least 16TB usable space
  • Network: 1 Gbps uplink with unmetered bandwidth

Storage Configuration:

When provisioning, you'll have options for how to configure your drives:

  • RAID 0: Combines drives for maximum capacity and speed, but no redundancy. A single drive failure means data loss.
  • RAID 1: Mirrors drives for redundancy, but you lose half your capacity.
  • No RAID: Use drives independently without any RAID configuration.
  • Custom: Allows you to specify a custom RAID level in the configuration comments.

For an archive node, RAID 5 is highly recommended because data integrity and redundancy are paramount. To set this up, select Custom during provisioning and specify "RAID 5" in the custom RAID level comment field. While you can resync from the network after a drive failure, the weeks-long process makes RAID 5's fault tolerance worthwhile with minimal capacity loss.

Operating System:

Select Ubuntu Server 22.04 LTS or 24.04 LTS as your operating system during provisioning. Choose the minimal installation without any pre-installed control panels or extra services.

Note: Most bare metal servers come with basic storage configurations (often 1TB NVMe), but Cherry Servers allows you to customize storage capacity during the provisioning process. Look for options to upgrade or add additional NVMe drives to meet the 16TB requirement. Don’t settle for the default storage if it’s insufficient, archive nodes won’t fit on anything less than what’s specified above.

Post-Provisioning: After provisioning completes (usually within a few hours), you'll receive an email with:

  • Server IP address
  • Root password
  • Any additional network information Connect to your server via SSH:
ssh root@your-server-ip

Enter the root password when prompted. Once logged in, immediately update the system:

apt update && apt upgrade -y

If you have multiple drives, check your storage configuration:

lsblk
df -h

This shows your block devices and mounted filesystems. Note where your NVMe drives are mounted, you'll need this path when configuring Geth and Lighthouse data directories.

#Step 2: Create a Dedicated User

Running blockchain clients as root is a security risk. Create a dedicated user for your Ethereum node:

useradd -m -s /bin/bash ethereum

This creates a user named ethereum with a home directory and bash shell. We’ll run both clients under this user account.

#Step 3: Install Dependencies

Install the required packages:

apt install -y software-properties-common wget curl git build-essential

These packages provide tools needed for downloading, building, and managing the Ethereum clients.

#Step 4: Install Geth (Execution Client)

Since we’re on ubuntu, we’ll use the official Ethereum PPA repository to install Geth. This ensures you get properly maintained packages with easy updates.

Add the Ethereum PPA repository:

sudo add-apt-repository -y ppa:ethereum/ethereum

Update the package list and install Geth:

sudo apt-get update
sudo apt-get install ethereum

This installs the stable version of Geth along with additional tools like clef, devp2p, abigen, bootnode, evm, and rlpdump.

Verify the installation:

geth version

You should see output showing the installed Geth version, along with Go version and architecture details.

#Step 5: Configure Geth Data Directory

Create a directory for Geth’s blockchain data. Based on the storage configuration you checked in step 1, choose an appropriate location on your NVMe drives:

mkdir -p /mnt/ethereum/geth
chown -R ethereum:ethereum /mnt/ethereum

If your NVMe drives are mounted elsewhere (check with lsblk or df -h ), adjust the path accordingly. The important thing is placing the data directory on your high-performance NVMe storage, not on the OS drive.

#Step 6: Create Geth Service File

Create a systemd service file to manage Geth as a background service.

Why is this important? Without a systemd service, you’d have to manually start Geth every time, and if the server reboots or Geth crashes, it won’t restart automatically. Running it directly in a terminal session also means it stops when you disconnect from SSH.

Create the file:

nano /etc/systemd/system/geth.service

Add the following configuration:

[Unit]
Description=Geth Execution Client (archive)
After=network.target
Wants=network.target
[Service]
User=ethereum
Group=ethereum
Type=simple
Restart=always
RestartSec=5
ExecStart=/usr/bin/geth \
  --datadir /mnt/ethereum/geth \
  --gcmode archive \
  --syncmode snap \
  --http \
  --http.api eth,net,web3,txpool \
  --http.addr 127.0.0.1 \
  --http.port 8545 \
  --authrpc.addr 127.0.0.1 \
  --authrpc.port 8551 \
  --authrpc.vhosts localhost \
  --authrpc.jwtsecret /mnt/ethereum/jwt.hex \
  --maxpeers 50
[Install]
WantedBy=multi-user.target

Key flags explained:

  • --gcmode archive: Keeps all historical state data instead of pruning
  • --syncmode snap: Uses snap sync for faster initial blockchain download
  • --http: Enables the HTTP-RPC server for queries
  • --http.api: Specifies which API namespaces to expose
  • --authrpc.*: Configuration for authenticated communication with the consensus client
  • --maxpeers: Limits peer connections to manage bandwidth Save and exit (Ctrl+X, then Y, then Enter).

#Step 7: Generate JWT Secret

Both Geth and Lighthouse need a shared JWT (JSON Web Token) secret for secure communication. This token authenticates the connection between the execution and consensus layers.

Generate the JWT secret:

openssl rand -hex 32 | tr -d "\n" > /mnt/ethereum/jwt.hex

Set the correct permissions:

chown ethereum:ethereum /mnt/ethereum/jwt.hex
chmod 600 /mnt/ethereum/jwt.hex

The chmod 600 ensures only the ethereum user can read the file, which is important for security since this token authorizes communication between your clients.

#Step 8: Install Lighthouse (Consensus Client)

Lighthouse doesn't have a PPA repository like Geth, so we'll download the pre-compiled binary from the official GitHub releases.

First, navigate to a temporary directory:

cd /tmp

Download the latest Lighthouse release. As of writing, the latest version is v8.0.0, but you should check the Lighthouse releases page for the most recent stable version:

wget https://github.com/sigp/lighthouse/releases/download/v8.0.0/lighthouse-v8.0.0-x86_64-unknown-linux-gnu.tar.gz

Extract the archive:

tar -xzf lighthouse-v8.0.0-x86_64-unknown-linux-gnu.tar.gz

Move the binary to a system directory and make it executable:

mv lighthouse /usr/local/bin/
chmod +x /usr/local/bin/lighthouse

Verify the installation:

lighthouse --version

You should see the Lighthouse version information displayed.

#Step 9: Configure Lighthouse Data Directory

Create a directory for Lighthouse's beacon chain data:

mkdir -p /mnt/ethereum/lighthouse
chown -R ethereum:ethereum /mnt/ethereum/lighthouse

Like with Geth, ensure this directory is on your NVMe storage for optimal performance.

#Step 10: Create Lighthouse Service File

Create a systemd service file for Lighthouse:

nano /etc/systemd/system/lighthouse.service

Add the following configuration:

[Unit]
Description=Lighthouse Consensus Client
After=network.target geth.service
Wants=network.target
Requires=geth.service
[Service]
User=ethereum
Group=ethereum
Type=simple
Restart=always
RestartSec=5
ExecStart=/usr/local/bin/lighthouse bn \
  --network mainnet \
  --datadir /mnt/ethereum/lighthouse \
  --http \
  --execution-endpoint http://127.0.0.1:8551 \
  --execution-jwt /mnt/ethereum/jwt.hex \
  --checkpoint-sync-url https://mainnet.checkpoint.sigp.io
[Install]
WantedBy=multi-user.target

Key flags explained:

  • bn: Runs Lighthouse in beacon node mode
  • --network mainnet: Connects to Ethereum mainnet
  • --execution-endpoint: Points to Geth's authenticated RPC port
  • --execution-jwt: Uses the JWT secret we created for authentication
  • --checkpoint-sync-url: Enables checkpoint sync, allowing Lighthouse to sync from a recent finalized checkpoint instead of from genesis. This dramatically reduces sync time from weeks to hours.

Save and exit (Ctrl+X, then Y, then Enter).

#Step 11: Start the Services

Now that both clients are configured, reload systemd to recognize the new service files:

systemctl daemon-reload

Enable both services to start automatically on boot:

systemctl enable geth lighthouse

Start Geth first, then Lighthouse:

systemctl start geth
systemctl start lighthouse

Check that both services are running:

systemctl status geth
systemctl status lighthouse

You should see active (running) in green for both services. If either shows an error, check the logs to diagnose the issue (we'll cover monitoring in the next step).

#Step 12: Monitor Sync Progress

Once both services are running, you'll want to monitor their sync progress. The initial sync takes time, and it's helpful to confirm everything is working correctly.

View Live Logs

To see real-time logs from Geth:

journalctl -fu geth

To see real-time logs from Lighthouse:

journalctl -fu lighthouse

Press Ctrl+C to exit the log view.

Check Geth Sync Status

Attach to the Geth console:

geth attach /mnt/ethereum/geth/geth.ipc

Inside the console, check sync status:

eth.syncing

If Geth is still syncing, this returns an object showing:

  • currentBlock: The block number Geth has synced to
  • highestBlock: The latest block on the network If fully synced, it returns false. To exit the Geth console, type exit.

Check Lighthouse Sync Status

Lighthouse provides an HTTP API for checking sync status:

curl http://localhost:5052/eth/v1/node/syncing

This returns JSON showing whether the beacon node is syncing and its progress.

What to Expect During Sync

  • Lighthouse will typically finish syncing first, usually within 1-3 days thanks to checkpoint sync. You'll see logs indicating slot processing and finalized checkpoints.
  • Geth will take significantly longer, potentially 2-4 weeks or more. Early blocks sync quickly, but as you reach blocks from 2020 onwards (DeFi summer era), processing slows down due to increased transaction complexity.
  • During sync, expect high disk I/O, fluctuating CPU usage, and constant network activity. This is normal behavior.

Resource Monitoring

Keep an eye on system resources:

htop

And disk usage:

df -h

Make sure you're not running out of disk space during the sync process.

#Step 13: Configure Firewall

Your node needs to communicate with other Ethereum nodes across the network. By default, most servers have firewall rules that block incoming connections. We need to open the necessary ports for peer-to-peer communication.

Install UFW (if not already installed)

Ubuntu typically comes with UFW (Uncomplicated Firewall). Check if it's installed:

ufw status

If UFW isn't installed:

apt install ufw

Configure Firewall Rules

Allow SSH (so you don't lock yourself out):

ufw allow 22/tcp

Allow Geth's P2P port (default is 30303 for both TCP and UDP):

ufw allow 30303/tcp
ufw allow 30303/udp

Allow Lighthouse's P2P port (default is 9000 for both TCP and UDP):

ufw allow 9000/tcp
ufw allow 9000/udp

Enable the firewall:

ufw enable

Verify the rules:

ufw status

You should see the ports listed as allowed.

Note on RPC Ports

The HTTP-RPC port (8545) and authenticated RPC port (8551) are bound to 127.0.0.1 (localhost only) in our configuration, so they're not accessible from outside the server. This is intentional for security. If you need to access these APIs remotely, set up an SSH tunnel or reverse proxy instead of exposing them directly to the internet.

#Optimization and Maintenance Best Practices

Running an archive node isn't a set-it-and-forget-it operation. Regular maintenance and performance optimization, ensure your node stays reliable over time.

#Performance Optimization

Disk I/O Monitoring archive nodes are heavily disk-bound. Monitor I/O performance regularly:

iostat -x 5

This shows disk utilization every 5 seconds. If you consistently see 100% utilization, your storage might be the bottleneck.

Database Compaction

Over time, Geth's database can become fragmented. If your node has been running for months, consider running offline compaction to reclaim space and improve performance:

systemctl stop geth
geth snapshot prune-state --datadir /mnt/ethereum/geth
systemctl start geth

This process can take several hours but significantly improves query performance. Connection Management If you're experiencing slow sync or poor peer connectivity, adjust the max peers setting. Edit your Geth service file and increase --maxpeers:

nano /etc/systemd/system/geth.service

Change --maxpeers 50 to --maxpeers 100, then reload and restart:

systemctl daemon-reload
systemctl restart geth

More peers generally means better sync performance, but it also increases bandwidth and memory usage.

#Maintenance Tasks

Regular Updates

Keep both clients updated. New releases include bug fixes, performance improvements, and security patches.

For Geth:

systemctl stop geth
apt update
apt upgrade geth
systemctl start geth

For Lighthouse, download the latest binary and replace the existing one:

systemctl stop lighthouse
cd /tmp
wget https://github.com/sigp/lighthouse/releases/download/vX.X.X/lighthouse-vX.X.X-x86_64-unknown-linux-gnu.tar.gz
tar -xzf lighthouse-vX.X.X-x86_64-unknown-linux-gnu.tar.gz
mv lighthouse /usr/local/bin/
chmod +x /usr/local/bin/lighthouse
systemctl start lighthouse

#How Difficult Is It to Maintain an Ethereum archive Node?

Maintaining an archive node requires ongoing attention but isn't overwhelming once the initial sync completes. You'll need to keep both clients updated with new releases, monitor disk space as the blockchain grows (expect roughly 50-100 GB of growth per month), and occasionally check for performance degradation. Most maintenance tasks are straightforward and can be scheduled monthly, though you should monitor disk usage more frequently to avoid running out of space mid-operation.

#Conclusion

Setting up an Ethereum archive node is a significant technical undertaking, but the payoff is complete autonomy over your blockchain data. Once synced, you'll have unrestricted access to query any historical state without depending on third-party providers or hitting rate limits.

High egress costs and lost transactions?

Switch to blockchain-optimized dedicated bare metal—save up to 60% on your cloud bill and double the performance compared to hyperscale cloud.

Buy a Dedicated Server with Crypto

We accept Bitcoin and other popular cryptocurrencies.

Share this article

Related Articles

Published on Dec 23, 2025 Updated on Dec 23, 2025

7 High-Performance RPC Node Providers [2026]

This guide compares seven high-performance RPC node providers for different Web3 projects to help your application stay fast, stable, and production-ready.

Read More
Published on Dec 23, 2025 Updated on Dec 23, 2025

How to Deploy a Solana Firedancer Validator [Step-by-Step]

This tutorial shows how to deploy a Solana Firedancer validator. We define the exact steps to deploy a Frankendancer testnet and confirm it is working properly.

Read More
Published on Dec 23, 2025 Updated on Dec 23, 2025

What is a Node Operator & How to Become One

This guide defines what is a node operator and how to become one. We also highlight typical requirements like hardware, operations, and stake for active participation.

Read More
We use cookies to ensure seamless user experience for our website. Required cookies - technical, functional and analytical - are set automatically. Please accept the use of targeted cookies to ensure the best marketing experience for your user journey. You may revoke your consent at any time through our Cookie Policy.
build: 511897d5f.1562