zkTrie
zkTrie Overview
This document provides a comprehensive explanation of zkTrie, which is a sparse binary Merkle Patricia Trie designed for efficiently storing key-value pairs. It details the tree structure, construction methods, node hashing processes, and the operations involved in insertion and deletion.
For further exploration, you can access our zkTrie repository.
Tree Structure
zkTrie is structured as a sparse binary Merkle Patricia Trie. Before delving into its specifics, it's helpful to understand the foundational concepts of Merkle Trees and Patricia Tries:
Merkle Tree: In a Merkle Tree, each leaf node contains a hash representing a data block, while each non-leaf node holds the hash of its child nodes.
Patricia Trie: A Patricia Trie is a type of radix tree or compressed trie used for efficient storage of key-value pairs. It optimizes the storage by allowing nodes with the same prefix to share a common path.
In zkTrie, there are three types of nodes:
Branch Node: In a binary tree structure, a branch node has two children.
Leaf Node: A leaf node contains the data for a key-value pair.
Empty Node: This special type of node indicates that the sub-trie sharing the same prefix is empty.
To compute node hashes, zkTrie utilizes the Poseidon hash function, which is more efficient and compatible for proof generation in zero-knowledge (zk) circuits.
zkTrie Structure
Tree Construction
To create a key-value pair in zkTrie, we first derive a secure key for the corresponding leaf node by hashing the original key (which may include an account address and storage key) using the Poseidon hash function. This approach ensures uniform distribution across the key space.
The construction of the path for a new leaf node involves traversing the secure key from the Least Significant Bit (LSB) to the Most Significant Bit (MSB). At each bit position, if the bit is 0, we traverse left; if it's 1, we traverse right.
We limit the depth of zkTrie to a maximum of 248 levels. This limitation arises because the finite field used by the Poseidon hash does not cover the entire range of (2^{256}), which could lead to ambiguities in the bit representation and soundness issues within the zk circuit. By restricting the depth to 248 bits, we ensure that the key space can effectively occupy the range of (2^{248}) without ambiguity.
An optimization technique is employed to reduce tree depth by collapsing subtrees that contain only one leaf node into a single leaf node. For instance, if a tree has nodes with keys 0100, 0010, and 1010, and only one node has a key with the suffix 00, the leaf node for key 0100 will only traverse the suffix without fully expanding, thus avoiding unnecessary depth.
Tree Operations
Insertion
When inserting a new leaf node into zkTrie, there are two primary scenarios:
Empty Node Case: If the traversal path leads to an empty node, we replace this empty node with the new leaf node and backtrack to update the Merkle hashes of the branch nodes up to the root.
Existing Leaf Node Case: If the traversal reaches an existing leaf node, we must push down the current leaf node until the next differing bit in the keys is found. Each step involves inserting an empty sibling node into the branch. Once we reach the level where the bits differ, we position the two leaf nodes according to their bit values and backtrack to update the Merkle hashes.
Deletion
The deletion process for a leaf node is analogous to insertion and involves two scenarios:
Sibling is a Branch Node: If the sibling of the leaf node to be deleted is a branch node, we can replace the targeted leaf node with an empty node and update the node hash for its ancestors up to the root.
Sibling is a Leaf Node: If the sibling is also a leaf node, we first replace the targeted leaf with an empty node. We then contract the sibling node upwards until it has a non-empty sibling. For instance, if deleting node b results in its sibling becoming empty, we must move the sibling node upwards until it reaches a leaf node, thus completing the deletion process.
It's important to note that a valid zkTrie cannot have a leaf node with an empty sibling. If such a case arises, the subtree should be pruned, and the leaf node should be elevated.
Node Hashing
In this section, we detail how the secure key for leaves and the node Merkle hash are calculated. The Poseidon hash function, with an arity of 2, is utilized for both hashing processes.
Empty Node
The hash of an empty node is simply 0.
Branch Node
The hash for a branch node is computed as follows:
[ \text{branchNodeHash} = h_{\text{BranchNodeType}}(\text{leftChildHash}, \text{rightChildHash}) ]
Leaf Node
The hash of a leaf node is calculated as:
[ \text{leafNodeHash} = h_{\text{LeafNodeType}}(\text{nodeKey}, \text{valueHash}) ]
This computation involves the following:
nodeKey: Derived from the original key.
valueHash: Computed by hashing the value stored in the leaf node.
The leaf nodes can be categorized into two types: Ethereum accounts and storage key-value pairs. Below is the hashing method for each leaf node type.
Ethereum Account Leaf Node
An Ethereum Account Leaf Node comprises an Ethereum address and a state account data structure. The secure key is derived from the Ethereum address as follows:
[ \text{var address} \text{ byte[20]} \quad // 20 bytes in big-endian ] [ \text{valHi} := \text{address}[0:16] ] [ \text{valLo} := \text{address}[16:20] \times 2^{96} \quad // padding with 12 bytes of 0 ] [ \text{nodeKey} := h_{512}(\text{valHi}, \text{valLo}) ]
The state account structure in Wischain includes the following fields (Fr denotes a finite field represented as a 254-bit value):
Nonce: u64
Balance: u256 (treated as Fr)
StorageRoot: Fr
KeccakCodeHash: u256
PoseidonCodeHash: Fr
CodeSize: u64
Before calculating the value hash, the state account is marshaled into a list of u256 values as follows (assuming big-endian encoding):
[0:32]
[0:16]: Reserved with all 0s
[16:24]: CodeSize (uint64 in big-endian)
[24:32]: Nonce (uint64 in big-endian)
[32:64]: Balance
[64:96]: StorageRoot
[96:128]: KeccakCodeHash
[128:160]: PoseidonCodehash (total 160 bytes)
The marshal function also produces a flag value indicating whether a u256 value cannot be treated as a field element (Fr). The flag for state accounts is set to 8:
u256
nonce
codesize
0
balance
root
Flag
0
0
0
1
0
The value hash is derived in two steps:
Convert non-field element values to field elements.
Combine these field elements in a binary tree structure until the tree root is calculated as the value hash.
For example, to convert the Keccak code hash into a field element:
[ \text{compressedKeccakCodeHash} := h_{512}(\text{keccakCodeHash}[0:16], \text{keccakCodeHash}[16:32]) ]
The value hash is then computed as follows:
[ \text{domain} := 256 \times 5 \quad // 5 elements for the valueHash ] [ \text{valueHash} := h_{\text{domain}}\left( h_{\text{domain}}\left( h_{\text{domain}}(\text{nonce} || \text{codesize} || 0, \text{balance}), h_{\text{domain}}(\text{storageRoot}, \text{compressedKeccakCodeHash}), \right), \text{poseidonCodeHash} \right) ]
Storage Leaf Node
A Storage Leaf Node encodes a key-value pair, where both the key and value are u256 values. The secure key for this leaf node is derived from the storage key:
[ \text{var storageKey byte[32]} \quad // 32 bytes in big-endian ] [ \text{valHi} := \text{storageKey}[0:16] ] [ \text{valLo} := \text{storageKey}[16:32] ] [ \text{nodeKey} := h_{512}(\text{valHi}, \text{valLo}) ]
The storage value is also a u256 value, and the flag for this value is set to 1:
u256
storageValue
Flag
1
The value hash for the storage leaf node is
computed as follows:
[ \text{valueHash} = h_{512}(\text{storageValue}[0:16], \text{storageValue}[16:32]) ]
Conclusion
The zkTrie structure and its associated operations provide an efficient means of managing key-value pairs while maintaining the security and integrity necessary for zk circuits. By leveraging advanced hashing techniques such as Poseidon hash and optimizing for storage and retrieval, zkTrie is well-equipped for use in Wischain and similar blockchain environments.
Last updated