SIMD-0215: accounts lattice hash
simd: '0215' title: Homomorphic Hashing of Account State authors: - Brooks Prumo - Emanuele Cesena - Josh Siegel - Sam Kim category: Standard type: Core status: Accepted created: 2024-12-20 feature: LTHasHQX6661DaDD4S6A2TFi6QBuiwXKv66fB1obfHq development: - Agave: Implemented (v2.2) - Firedancer: Implemented (v0.2)
Summary
This proposal adds a new hash, Accounts Lattice Hash, that uses homomorphic hashing to maintain a hash of the total account state. The Accounts Lattice Hash is both fast to update, and secure, enabling (1) every block to contain the hash of all accounts, instead of only the accounts changed in that block, and (2) removing the Epoch Accounts Hash.
Motivation
The main goal is to scale Solana to billions accounts, and compute a "hash of all accounts" in practical time and space.
Currently there are two main kinds of accounts hashes used in Solana:
- The Epoch Accounts Hash, which is a merkle-based hash of the total account state.
- The Accounts Delta Hash, which is also a merkle-based hash of accounts written in a single block.
Both of these hashes have limitations. They calculate the root of a Merkle tree of accounts, sorted by public key, which makes scaling challenging due to the amount of data required. This is also why there are two distinct hashes: the Epoch Accounts Hash hashes the every account, but is infrequent; and the Accounts Delta Hash is every block, but only contains the subset of accounts written in that block.
Ideally every block would contain a hash of the total account state.
Homomorphic hashing functions enable computing the hash of the total account
state by incrementally accumulating the hash of each account. This removes
the need to sort and maintain large intermediate state. Said another way, the
total account state hash for block B
can now be computed from block B - 1
's
total account state hash plus only the accounts that changed in block B
.
New Terminology
Accounts Lattice Hash is the new "hash of all accounts", built using a lattice-based homomorphic hashing function.
Homomorphic hashing "is the concept to solve the following problem: Given the hash of an input, along with a small update to the input, how can you compute the hash of the new input with its update applied, without having to recompute the entire hash from scratch?" [1].
Lattice hash is the specific homomorphic hashing function chosen to compute both an individual account hash, and the hash of all accounts.
Detailed Design
Defining LtHash
LtHash is the 2048-byte result of hashing an input with the lattice hashing function.
A hash function has interface: init
, append
, fini
. A homomorphic hash
function has two additional methods: add
, sub
. Finally, it is convenient
to have a 32-byte result, so we introduce the method: out
.
For LtHash:
init()
: is blake3.init
append(data)
: is blake3.append(data)
fini()
: is blake3_xof.fini(2048), i.e. return a 2048-byte output using the
property that Blake3 is an eXtendable Output Function (XOF).
add(a, b)
: interpret the hash values a, b
as arrays of 1024 u16s.
Wrapping-add each pair of elements in the arrays.
sub(a, b)
: interpret the hash values a, b
as arrays of 1024 u16s.
Wrapping-sub each pair of elements in the arrays.
out(a)
: blake3.fini( blake3.append( blake3.init(), a ) ), i.e. the 32-byte
blake3 of the 2048-byte data.
Calculating LtHash of one account
The LtHash of a single account is defined as:
lthash(account) :=
if account.lamports == 0:
return 00..00
else:
lthash.init()
lthash.append( account.lamports )
lthash.append( account.data )
lthash.append( account.is_executable )
lthash.append( account.owner )
lthash.append( account.pubkey )
return lthash.fini()
Calculating LtHash of all accounts, aka Accounts Lattice Hash
For a set of accounts A
:
LTHASH(A) = \sum_{a \in A} lthash(a)
Specifically, the Accounts Lattice Hash is the sum of the (single) account lthash for all accounts.
Incrementally updating Accounts Lattice Hash per block
For a given block, there are a set of accounts modified in that block. Let
account
indicate the state of the account before modification, and
account'
indicate the state of the account after modification.
To compute the accounts lattice hash for a block, LTHASH'
, and given the
prior block's accounts lattice hash, LTHASH
:
LTHASH' := LTHASH
for each account modified in B:
LTHASH'.sub( lthash( account ) )
LTHASH'.add( lthash( account' ) )
return LTHASH'
Notes:
- The order in which account lthashes are add/sub is irrelevant.
- If an account is modified by multiple transaction, it is possible to either compute its lthashes before/after each individual transaction or the entire block. The final result is the same.
- Every account modified in a block must be reflected in that block's Accounts Lattice Hash. This applies to both on-chain and off-chain modifications.
Changes to the Bank Hash
The bank hash for each block will now include the Accounts Lattice Hash. The Accounts Lattice Hash will be incrementally updated and mixed in instead of the Epoch Accounts Hash (and in every block, not just once per epoch). It is safe to replace the Epoch Accounts Hash with the Accounts Lattice Hash since they are both hashes of the total account state.
Note that this SIMD does not make any changes to the Accounts Delta Hash.
Feature Activation
At feature activation, all nodes must have the Accounts Lattice Hash. If they haven't already been computing/updating it, they must now do so.
It is recommended to provide a mechanism for nodes to opt-in to computing the initial Accounts Lattice Hash and begin updating the value each block before feature activation. This will enable the cluster to pre-compute the Accounts Lattice Hash prior to feature activation and avoid a cluster-wide pause while all nodes calculate the Accounts Lattice Hash.
Snapshots
Since computing the initial Accounts Lattice Hash is expensive, we want to avoid this computation when possible. The Accounts Lattice Hash will be written to the snapshot, and will be read out at boot time (if present).
It is recommended for nodes to verify the Accounts Lattice Hash read from the snapshot.
Note that this SIMD does not make any changes to the Snapshot Hash.
Epoch Accounts Hash
As stated above, the Accounts Lattice Hash will replace the Epoch Accounts Hash. Thus, the Epoch Accounts Hash will effectively be removed from consensus at feature activation. The snapshot format will not change to remove the Epoch Accounts Hash field.
Alternatives Considered
The following alternatives were considered:
- Merkle Trees
- Pros: well known, can support inclusion proofs
- Cons: won't scale
- Verkle Trees
- Pros: scale better, support inclusion proofs
- Cons: not fast enough for Solana
- Incremental Hash based on XOR
- Insecure: known sub-exponential attacks exist [3, 4]
- Incremental Hash based on Elliptic Curve, e.g. Ristretto or GLS254
- Pros: secure incremental hash
- Cons: not as efficient as LtHash (10-100x slower)
Impact
Only validators will be impacted.
Security Considerations
LtHash instantiated with BLAKE3 and a 2048-byte output provides the desired 128-bit security [1, Appendix A].
Drawbacks
The Accounts Lattice Hash does NOT support inclusion/exclusion proofs.
Backwards Compatibility (Optional)
Incompatible. This changes the bank hash, thus changing consensus.
Bibliography
- Lewi, Kim, Maykov, Weis, Securing Update Propagation with Homomorphic Hashing, 2019, ia.cr/2019/227
- Bellare, Goldreich, Goldwasser, Incremental Cryptography: The Case of Hashing and Signing, 1994, cseweb.ucsd.edu
- Bellare, Micciancio, A new paradigm for collision-free hashing: Incrementality at reduced cost, 1996, cseweb.ucsd.edu
- Wagner, A Generalized Birthday Problem, 2002, crypto2002
- O'Connor, Ausmasson, Neves, Wilcox-O'Hearn, BLAKE3, 2021, PDF