What is an Archive Node [A Comprehensive Guide]
Most blockchain queries are about the present i.e. the latest account balance, the current state of a smart contract, or the newest block. But some workloads depend on being able to ask precise questions about the past, like what an account balance was at a specific block, or what a contract’s storage looked like before an upgrade. That’s the problem archive nodes are designed to solve.
In this guide, I’ll explain what an archive node is and why some blockchain applications depend on it. You’ll learn how an archive node works, how it differs from full nodes and RPC nodes, and what it takes to run one.
#What is an archive node?
An archive node is a node configured to build and retain an archive of historical states rather than pruning them. In the Ethereum’s ecosystem, an archive node is an Ethereum client configured to store all historical states, which makes it useful for certain use cases, but typically more demanding to operate than a standard full node.
Deploy a Solana archive node in minutes
Dedicated server configurations optimized for Solana archive workloads.
#What “state” means (and why it matters)
-
Transactions are the actions that occur on-chain (transfers, swaps, contract calls).
-
State is the resulting snapshot at a point in time (balances, contract storage, account data).
A typical node can answer questions about the latest state efficiently. An archive node can also answer state queries for older blocks without needing to rely on pruned data or reconstruction.
#Archive node vs full node vs RPC node vs validator node
These terms often get blended together in real deployments, so it helps to separate them by purpose: data retention, serving requests, or participating in consensus.
#Full node
A full node verifies blocks and maintains the chain, but many full-node configurations prune older state to reduce storage and operational overhead. That’s why “full node” doesn’t automatically mean “full historical state available forever.”
#Archive (archival) node
An archive node has full node capabilities and retains historical states instead of pruning them. QuickNode summarizes this as: archive nodes inherit full node capabilities and build an archive of historical states, which is especially useful for deep historical queries.
#RPC node
An RPC node is a node exposed through RPC endpoints so applications can read chain data and submit transactions (e.g., get balances, send transactions, read contract state). It’s best understood as a service role/interface: an RPC node may be backed by a full node or an archive node depending on what the endpoint needs to support.
#Validator node
A validator node participates in consensus (validating/proposing blocks depending on the chain). Its primary responsibility is maintaining network security and integrity, not serving heavy historical analytics—though it may be run alongside other nodes for data access needs.
#How does an archive node work?
An archive node follows the same core workflow as other nodes: it downloads blocks, verifies them, and executes transactions to keep its view of the chain correct. The difference is what it chooses to keep on disk.
#Pruning vs keeping historical state
Many node setups remove older state data to reduce disk growth. That choice keeps storage and ongoing maintenance more manageable, but it limits some historical queries.
An archive node keeps historical state instead. In the Ethereum ecosystem, archive nodes store not only the latest state, but every historical state created after each block. Without archive mode, getting historical state can require significant computation, because the client may need to execute past transactions to reconstruct that older state.
#What an archive node stores
An easy way to think about it is that both full nodes and archive nodes have chain history, but archive nodes keep more state history available for queries.
-
Blocks and transactions, the record of what happened.
-
State, the current results of executing those transactions.
-
Historical state, the past versions of state at earlier blocks, which is what enables accurate “state at block N” queries.
#Why archive nodes answer historical queries faster
When you ask a question like “what was the state at this older block,” an archive node can usually read it directly from its stored historical state. A non archive setup may need to reconstruct older state, which is slower and more resource intensive.
#How this relates to RPC nodes and validator nodes
Archive mode describes data retention. RPC and validator describe how a node is used.
An RPC node is a node that exposes functionality through RPC endpoints so apps can query data and submit transactions. If you want those endpoints to support deep historical queries, the RPC node needs to be backed by an archive node rather than a pruned setup.
A validator node participates in consensus by checking transactions and helping maintain network integrity. That job is separate from keeping full historical state, which is why “validator” and “archive” are different choices.
#How it looks in Ethereum and Solana
On Ethereum, archive nodes are commonly discussed in terms of historical state retention for precise “state at block” queries.
On Solana, “archive” discussions often focus on retaining complete ledger history at a very large scale, and Solana also describes “Archivers” as a way to handle long term storage of ledger data.
#Why archive nodes matter
An archive node is mainly about one thing: reliable access to historical chain data at the state level. That matters whenever you need answers that are tied to a specific point in the chain’s history, not just the latest block.
#Accurate “state at a past block” queries
A lot of real workloads depend on queries like “what was the balance at block X?” or “what did this contract storage look like before an upgrade?” Archive nodes make those queries practical because they retain historical states rather than relying on reconstructed data.
#Powering explorers, analytics, and backfills
If you are building anything that must show deep history consistently, archive data becomes a foundation. Examples include block explorers, on chain analytics platforms, and indexing pipelines that need to backfill large time ranges and re-run historical computations. Alchemy explicitly calls out explorers, analytics tools, and wallets as examples of services that rely on archive data to serve older state to users.
#Auditing, investigations, and compliance workflows
When you need a verifiable timeline, you cannot hand wave historical state. Audits and investigations often require proving what the chain looked like at a given block height, then reproducing the same result later. Archive nodes support that kind of repeatable historical inspection.
#Better guarantees for RPC services that support historical methods
Many teams discover they “need an archive node” only after certain RPC calls fail for older blocks or return incomplete results. An RPC service can be backed by a pruned node or an archive node. If you want the RPC layer to support deep historical access, the backend needs archive data. For context on the RPC role itself, you can link to Cherry’s guide on RPC nodes.
#How this shows up in Ethereum vs Solana
Ethereum
Ethereum’s archive node concept is strongly tied to historical state retention. Archive nodes store all historical states, which makes them suited for precise historical state queries that are common in analytics, debugging, and long range data products.
Solana
On Solana, storing full ledger history locally on a typical RPC node is often treated as impractical at scale. The ecosystem commonly relies on specialized long term storage setups, sometimes described as warehouse nodes, to keep transaction history available for services. Separately, Solana describes “Archivers” as a protocol level approach to distributing ledger storage for very large data volumes.
#How do you run an archive node?
Running an archive node comes down to three decisions: which chain you’re targeting, which client you’ll run, and whether you’re prepared for the storage and sync overhead that comes with keeping historical state.
On Ethereum, an archive node is defined as an Ethereum client configured to build an archive of all historical states, which is useful for certain workloads but can be more demanding to operate than a standard full node.
#What is an archive node client?
A client is the software implementation of a blockchain protocol that lets your machine join the network, sync blocks, verify data, and serve requests. In archive mode, the same client is configured to retain historical state instead of pruning it.
#Popular Ethereum archive node clients
Most teams pick from a small set of widely used Ethereum clients, then enable archive-mode settings per the client’s documentation. Alchemy lists common choices such as Geth, Erigon, Nethermind, and Besu.
#What hardware do you need?
Archive nodes retain far more data than pruned nodes, so storage is the first constraint. Alchemy notes that archive nodes require significantly more space than full nodes because they keep historical state, and it includes multi-terabyte figures for Ethereum archive storage.
Your exact requirements depend on the chain, the client, and the growth rate of the dataset, but the planning rule stays consistent: prioritize high sustained disk I/O and enough disk headroom for growth.
Reduce Your Hosting Costs with Cherry Servers' Bare Metal
Read how StableLab, a leader in decentralized governance, cut hosting costs by ~35% and boosted uptime to nearly 100%.
#High-level setup flow (Ethereum)
You don’t need to memorize flags to understand the process. The steps are simple, even though the sync can be time-consuming.
-
Choose a client and follow its official archive-mode instructions. For Geth, the archive guide calls out full sync plus archive settings as the path to retaining historical states.
-
Start a full sync from genesis and let the node finish indexing before treating historical queries as reliable. Geth explicitly notes that historical states become accessible only after indexing completes.
-
Validate the setup with a historical query, not only “latest” queries. Ethereum documentation describes archive nodes as the option needed for reliable “state at old block” queries.
#How this differs on Solana
On Solana, “archive” often points to long-term ledger history at a very large scale. Solana’s own write-up on Archivers frames them as a distributed ledger storage approach built for petabytes of blockchain data, reducing the long-term storage burden on validators.
In practice, teams that need deep historical access commonly combine RPC with indexing and storage designed for historical lookup, because raw ledger traversal is inefficient for many production query patterns.
#What hardware do you need to run an archive node?
Archive nodes are disk-heavy and I/O-heavy. The main reason is simple: archive mode keeps far more historical data available for queries, so the database is larger and the node spends more time writing and reading data during initial sync and day-to-day operation. Ethereum documentation also emphasizes that storage is the primary bottleneck for node operators and that SSDs are strongly preferred for keeping up reliably.
#Practical baseline for an Ethereum archive node
Exact numbers depend on the client you choose, but this is a sensible starting target for a production-style archive setup:
| Component | Starting point to plan around |
|---|---|
| CPU | 8–12 modern cores |
| RAM | 64 GB |
| Storage | NVMe SSD; capacity varies a lot by client |
| Network | Stable broadband; higher bandwidth helps initial sync consistency |
#Storage is the deciding factor
Archive storage is not one fixed number because clients store and index history differently.
Client documentation for Geth states that a full archive node that keeps all state back to genesis requires more than 12 TB, and it highlights disk as the main bottleneck.
Other recent requirement breakdowns show how wide the gap can be in practice, with estimates like roughly 18–20 TB for Geth archive mode versus roughly 3–3.5 TB for Erigon, depending on configuration and current chain growth.
#What matters more than raw capacity
If the storage is slow, the node will feel slow, even if you have enough TB. SSDs (often NVMe) are the safest default because syncing and staying current involve constant database updates. Planning for a single, reliable volume with enough headroom also helps performance stay stable as the dataset grows.
#Conclusion
Archive nodes exist for one clear reason: reliable access to historical state. A standard full node is often optimized for staying current and may prune older state, while an archive node retains historical states so you can answer precise questions about the chain at a specific block height.
In practice, archive nodes become important when you are building systems that depend on history being consistently queryable, such as explorers, analytics, audits, investigations, and large backfills.
If those historical queries need to be available to applications, they are typically exposed through an RPC layer, and the RPC backend needs to be capable of serving the history you want to support. Validators are a separate role focused on consensus and network integrity, not historical data retention.
Hardware planning is mainly a storage and I/O decision. Archive mode increases disk requirements significantly, and SSD (often NVMe) is the common baseline for performance and stability. Storage size varies widely by client and configuration, ranging from a few terabytes in some setups to well over 12 TB in others, with some recent estimates placing certain archive configurations in the high-teens terabytes.
High egress costs and lost transactions?
Switch to blockchain-optimized dedicated bare metal—save up to 60% on your cloud bill and double the performance compared to hyperscale cloud.
Deploy secure and high-performance nodes on dedicated infrastructure.