From f4c56cbd53b8352a4ec4633102af08019be959a3 Mon Sep 17 00:00:00 2001 From: Howard Hinnant Date: Mon, 2 Mar 2020 17:24:35 -0500 Subject: [PATCH] Update SHAMap Documentation --- src/ripple/shamap/README.md | 408 +++++++++++++++++++++++++----------- 1 file changed, 284 insertions(+), 124 deletions(-) diff --git a/src/ripple/shamap/README.md b/src/ripple/shamap/README.md index e28d3bbfc3..a04a63c5f5 100644 --- a/src/ripple/shamap/README.md +++ b/src/ripple/shamap/README.md @@ -1,189 +1,349 @@ # SHAMap Introduction # -July 2014 +March 2020 -The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree). -The SHAMap is also a radix tree of radix 16 +The `SHAMap` is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree). +The `SHAMap` is also a radix trie of radix 16 (http://en.wikipedia.org/wiki/Radix_tree). -*We need some kind of sensible summary of the SHAMap here.* +The Merkle trie data structure is important because subtrees and even the entire +tree can be compared with other trees in O(1) time by simply comparing the hashes. +This makes it very efficient to determine if two `SHAMap`s contain the same set of +transactions or account state modifications. -A given SHAMap always stores only one of three kinds of data: +The radix trie property is helpful in that a key (hash) of a transaction +or account state can be used to navigate the trie. + +A `SHAMap` is a trie with two node types: + +1. SHAMapInnerNode +2. SHAMapTreeNode + +Both of these nodes directly inherit from SHAMapAbstractNode which holds data +common to both of the node types. + +All non-leaf nodes have type SHAMapInnerNode. + +All leaf nodes have type SHAMapTreeNode. + +The root node is always a SHAMapInnerNode. + +A given `SHAMap` always stores only one of three kinds of data: * Transactions with metadata * Transactions without metadata, or * Account states. -So all of the leaf nodes of a particular SHAMap will always have a uniform -type. The inner nodes carry no data other than the hash of the nodes -beneath them. +So all of the leaf nodes of a particular `SHAMap` will always have a uniform type. +The inner nodes carry no data other than the hash of the nodes beneath them. +All nodes are owned by shared_ptrs resident in either other nodes, or in case of +the root node, a shared_ptr in the `SHAMap` itself. The use of shared_ptrs +permits more than one `SHAMap` at a time to share ownership of a node. This +occurs (for example), when a copy of a `SHAMap` is made. -## SHAMap Types ## +Copies are made with the `snapShot` function as opposed to the `SHAMap` copy +constructor. See the section on `SHAMap` creation for more details about +`snapShot`. -There are two different ways of building and using a SHAMap: +Sequence numbers are used to further customize the node ownership strategy. See +the section on sequence numbers for details on sequence numbers. - 1. A mutable SHAMap and - 2. An immutable SHAMap +![node diagram](https://user-images.githubusercontent.com/46455409/77350005-1ef12c80-6cf9-11ea-9c8d-56410f442859.png) -The distinction here is not of the classic C++ immutable-means-unchanging -sense. An immutable SHAMap contains *nodes* that are immutable. Also, -once a node has been located in an immutable SHAMap, that node is -guaranteed to persist in that SHAMap for the lifetime of the SHAMap. +## Mutability ## -So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes -are introduced. But an immutable SHAMap will never get smaller (until it -entirely evaporates when it is destroyed). Nodes, once introduced to the -immutable SHAMap, also never change their location in memory. So nodes in -an immutable SHAMap can be handled using raw pointers (if you're careful). +There are two different ways of building and using a `SHAMap`: -One consequence of this design is that an immutable SHAMap can never be -"trimmed". There is no way to identify unnecessary nodes in an immutable -SHAMap that could be removed. Once a node has been brought into the -in-memory SHAMap, that node stays in memory for the life of the SHAMap. + 1. A mutable `SHAMap` and + 2. An immutable `SHAMap` -Most SHAMaps are immutable, in the sense that they don't modify or remove -their contained nodes. +The distinction here is not of the classic C++ immutable-means-unchanging sense. + An immutable `SHAMap` contains *nodes* that are immutable. Also, once a node has +been located in an immutable `SHAMap`, that node is guaranteed to persist in that +`SHAMap` for the lifetime of the `SHAMap`. -An example where a mutable SHAMap is required is when we want to apply -transactions to the last closed ledger. To do so we'd make a mutable -snapshot of the state tree and then start applying transactions to it. -Because the snapshot is mutable, changes to nodes in the snapshot will not -affect nodes in other SHAMAps. +So, somewhat counter-intuitively, an immutable `SHAMap` may grow as new nodes are +introduced. But an immutable `SHAMap` will never get smaller (until it entirely +evaporates when it is destroyed). Nodes, once introduced to the immutable +`SHAMap`, also never change their location in memory. So nodes in an immutable +`SHAMap` can be handled using raw pointers (if you're careful). -An example using a immutable ledger would be when there's an open ledger -and some piece of code wishes to query the state of the ledger. In this -case we don't wish to change the state of the SHAMap, so we'd use an -immutable snapshot. +One consequence of this design is that an immutable `SHAMap` can never be +"trimmed". There is no way to identify unnecessary nodes in an immutable `SHAMap` +that could be removed. Once a node has been brought into the in-memory `SHAMap`, +that node stays in memory for the life of the `SHAMap`. +Most `SHAMap`s are immutable, in the sense that they don't modify or remove their +contained nodes. + +An example where a mutable `SHAMap` is required is when we want to apply +transactions to the last closed ledger. To do so we'd make a mutable snapshot +of the state trie and then start applying transactions to it. Because the +snapshot is mutable, changes to nodes in the snapshot will not affect nodes in +other `SHAMap`s. + +An example using a immutable ledger would be when there's an open ledger and +some piece of code wishes to query the state of the ledger. In this case we +don't wish to change the state of the `SHAMap`, so we'd use an immutable snapshot. + +## Sequence numbers ## + +Both `SHAMap`s and their nodes carry a sequence number. This is simply an +unsigned number that indicates ownership or membership, or a non-membership. + +`SHAMap`s sequence numbers normally start out as 1. However when a snap-shot of +a `SHAMap` is made, the copy's sequence number is 1 greater than the original. + +The nodes of a `SHAMap` have their own copy of a sequence number. If the `SHAMap` +is mutable, meaning it can change, then all of its nodes must have the +same sequence number as the `SHAMap` itself. This enforces an invariant that none +of the nodes are shared with other `SHAMap`s. + +When a `SHAMap` needs to have a private copy of a node, not shared by any other +`SHAMap`, it first clones it and then sets the new copy to have a sequence number +equal to the `SHAMap` sequence number. The `unshareNode` is a private utility +which automates the task of first checking if the node is already sharable, and +if so, cloning it and giving it the proper sequence number. An example case +where a private copy is needed is when an inner node needs to have a child +pointer altered. Any modification to a node will require a non-shared node. + +When a `SHAMap` decides that it is safe to share a node of its own, it sets the +node's sequence number to 0 (a `SHAMap` never has a sequence number of 0). This +is done for every node in the trie when `SHAMap::walkSubTree` is executed. + +Note that other objects in rippled also have sequence numbers (e.g. ledgers). +The `SHAMap` and node sequence numbers should not be confused with these other +sequence numbers (no relation). ## SHAMap Creation ## -A SHAMap is usually not created from vacuum. Once an initial SHAMap is -constructed, later SHAMaps are usually created by calling -snapShot(bool isMutable) on the original SHAMap(). The returned SHAMap -has the expected characteristics (mutable or immutable) based on the passed -in flag. +A `SHAMap` is usually not created from vacuum. Once an initial `SHAMap` is +constructed, later `SHAMap`s are usually created by calling snapShot(bool +isMutable) on the original `SHAMap`. The returned `SHAMap` has the expected +characteristics (mutable or immutable) based on the passed in flag. -It is cheaper to make an immutable snapshot of a SHAMap than to make a mutable -snapshot. If the SHAMap snapshot is mutable then any of the nodes that might -be modified must be copied before they are placed in the mutable map. +It is cheaper to make an immutable snapshot of a `SHAMap` than to make a mutable +snapshot. If the `SHAMap` snapshot is mutable then sharable nodes must be +copied before they are placed in the mutable map. +A new `SHAMap` is created with each new ledger round. Transactions not executed +in the previous ledger populate the `SHAMap` for the new ledger. -## SHAMap Thread Safety ## +## Storing SHAMap data in the database ## -*This description is obsolete and needs to be rewritten.* +When consensus is reached, the ledger is closed. As part of this process, the +`SHAMap` is stored to the database by calling `SHAMap::flushDirty`. -SHAMaps can be thread safe, depending on how they are used. The SHAMap -uses a SyncUnorderedMap for its storage. The SyncUnorderedMap has three -thread-safe methods: +Both `unshare()` and `flushDirty` walk the `SHAMap` by calling +`SHAMap::walkSubTree`. As `unshare()` walks the trie, nodes are not written to +the database, and as `flushDirty` walks the trie nodes are written to the +database. `walkSubTree` visits every node in the trie. This process must ensure +that each node is only owned by this trie, and so "unshares" as it walks each +node (from the root down). This is done in the `preFlushNode` function by +ensuring that the node has a sequence number equal to that of the `SHAMap`. If +the node doesn't, it is cloned. - * size(), - * canonicalize(), and - * retrieve() +For each inner node encountered (starting with the root node), each of the +children are inspected (from 1 to 16). For each child, if it has a non-zero +sequence number (unshareable), the child is first copied. Then if the child is +an inner node, we recurse down to that node's children. Otherwise we've found a +leaf node and that node is written to the database. A count of each leaf node +that is visited is kept. The hash of the data in the leaf node is computed at +this time, and the child is reassigned back into the parent inner node just in +case the COW operation created a new pointer to this leaf node. -As long as the SHAMap uses only those three interfaces to its storage -(the mTNByID variable [which stands for Tree Node by ID]) the SHAMap is -thread safe. +After processing each node, the node is then marked as sharable again by setting +its sequence number to 0. +After all of an inner node's children are processed, then its hash is updated +and the inner node is written to the database. Then this inner node is assigned +back into it's parent node, again in case the COW operation created a new +pointer to it. ## Walking a SHAMap ## -*We need a good description of why someone would walk a SHAMap and* -*how it works in the code* +The private function `SHAMap::walkTowardsKey` is a good example of *how* to walk +a `SHAMap`, and the various functions that call `walkTowardsKey` are good examples +of *why* one would want to walk a `SHAMap` (e.g. `SHAMap::findKey`). +`walkTowardsKey` always starts at the root of the `SHAMap` and traverses down +through the inner nodes, looking for a leaf node along a path in the trie +designated by a `uint256`. +As one walks the trie, one can *optionally* keep a stack of nodes that one has +passed through. This isn't necessary for walking the trie, but many clients +will use the stack after finding the desired node. For example if one is +deleting a node from the trie, the stack is handy for repairing invariants in +the trie after the deletion. + +To assist in walking the trie, `SHAMap::walkTowardsKey` uses a `SHAMapNodeID` +that identifies a node by its path from the root and its depth in the trie. The +path is just a "list" of numbers, each in the range [0 .. 15], depicting which +child was chosen at each node starting from the root. Each choice is represented +by 4 bits, and then packed in sequence into a `uint256` (such that the longest +path possible has 256 / 4 = 64 steps). The high 4 bits of the first byte +identify which child of the root is chosen, the lower 4 bits of the first byte +identify the child of that node, and so on. The `SHAMapNodeID` identifying the +root node has an ID of 0 and a depth of 0. See `SHAMapNodeID::selectBranch` for +details of how a `SHAMapNodeID` selects a "branch" (child) by indexing into its +path with its depth. + +While the current node is an inner node, traversing down the trie from the root +continues, unless the path indicates a child that does not exist. And in this +case, `nullptr` is returned to indicate no leaf node along the given path +exists. Otherwise a leaf node is found and a (non-owning) pointer to it is +returned. At each step, if a stack is requested, a +`pair, SHAMapNodeID>` is pushed onto the stack. + +When a child node is found by `selectBranch`, the traversal to that node +consists of two steps: + +1. Update the `shared_ptr` to the current node. +2. Update the `SHAMapNodeID`. + +The first step consists of several attempts to find the node in various places: + +1. In the trie itself. +2. In the node cache. +3. In the database. + +If the node is not found in the trie, then it is installed into the trie as part +of the traversal process. ## Late-arriving Nodes ## -As we noted earlier, SHAMaps (even immutable ones) may grow. If a SHAMap -is searching for a node and runs into an empty spot in the tree, then the -SHAMap looks to see if the node exists but has not yet been made part of -the map. This operation is performed in the `SHAMap::fetchNodeExternalNT()` -method. The *NT* is this case stands for 'No Throw'. +As we noted earlier, `SHAMap`s (even immutable ones) may grow. If a `SHAMap` is +searching for a node and runs into an empty spot in the trie, then the `SHAMap` +looks to see if the node exists but has not yet been made part of the map. This +operation is performed in the `SHAMap::fetchNodeNT()` method. The *NT* +is this case stands for 'No Throw'. -The `fetchNodeExternalNT()` method goes through three phases: +The `fetchNodeNT()` method goes through three phases: 1. By calling `getCache()` we attempt to locate the missing node in the - TreeNodeCache. The TreeNodeCache is a cache of immutable - SHAMapTreeNodes that are shared across all SHAMaps. + TreeNodeCache. The TreeNodeCache is a cache of immutable SHAMapTreeNodes + that are shared across all `SHAMap`s. - Any SHAMapTreeNode that is immutable has a sequence number of zero. - When a mutable SHAMap is created then its SHAMapTreeNodes are given - non-zero sequence numbers. So the `assert (ret->getSeq() == 0)` - simply confirms that the TreeNodeCache indeed gave us an immutable node. + Any SHAMapTreeNode that is immutable has a sequence number of zero + (sharable). When a mutable `SHAMap` is created then its SHAMapTreeNodes are + given non-zero sequence numbers (unsharable). But all nodes in the + TreeNodeCache are immutable, so if one is found here, its sequence number + will be 0. 2. If the node is not in the TreeNodeCache, we attempt to locate the node - in the historic data stored by the data base. The call to - to `fetch(hash)` does that work for us. - - 3. Finally, if ledgerSeq_ is non-zero and we did't locate the node in the - historic data, then we call a MissingNodeHandler. - - The non-zero ledgerSeq_ indicates that the SHAMap is a complete map that - belongs to a historic ledger with the given (non-zero) sequence number. - So, if all expected data is always present, the MissingNodeHandler should - never be executed. - - And, since we now know that this SHAMap does not fully represent - the data from that ledger, we set the SHAMap's sequence number to zero. - -If phase 1 returned a node, then we already know that the node is immutable. -However, if either phase 2 executes successfully, then we need to turn the -returned node into an immutable node. That's handled by the call to -`make_shared` inside the try block. That code is inside -a try block because the `fetchNodeExternalNT` method promises not to throw. -In case the constructor called by `make_shared` throws we don't want to -break our promise. + in the historic data stored by the data base. The call to to + `fetchNodeFromDB(hash)` does that work for us. + 3. Finally if a filter exists, we check if it can supply the node. This is + typically the LedgerMaster which tracks the current ledger and ledgers + in the process of closing. ## Canonicalize ## -The calls to `canonicalize()` make sure that if the resulting node is already -in the SHAMap, then we return the node that's already present -- we never -replace a pre-existing node. By using `canonicalize()` we manage a thread -race condition where two different threads might both recognize the lack of a -SHAMapTreeNode at the same time. If they both attempt to insert the node -then `canonicalize` makes sure that the first node in wins and the slower -thread receives back a pointer to the node inserted by the faster thread. +`canonicalize()` is called every time a node is introduced into the `SHAMap`. -There's a problem with the current SHAMap design that `canonicalize()` -accommodates. Two different trees can have the exact same node (the same -hash value) with two different IDs. If the TreeNodeCache returns a node -with the same hash but a different ID, then we assume that the ID of the -passed-in node is 'better' than the older ID in the TreeNodeCache. So we -construct a new SHAMapTreeNode by copying the one we found in the -TreeNodeCache, but we give the new node the new ID. Then we replace the -SHAMapTreeNode in the TreeNodeCache with this newly constructed node. +A call to `canonicalize()` stores the node in the `TreeNodeCache` if it does not +already exist in the `TreeNodeCache`. -The TreeNodeCache is not subject to the rule that any node must be -resident forever. So it's okay to replace the old node with the new node. +The calls to `canonicalize()` make sure that if the resulting node is already in +the `SHAMap`, node `TreeNodeCache` or database, then we don't create duplicates +by favoring the copy already in the `TreeNodeCache`. -The `SHAMap::getCache()` method exhibits the same behavior. +By using `canonicalize()` we manage a thread race condition where two different +threads might both recognize the lack of a SHAMapTreeNode at the same time +(during a fetch). If they both attempt to insert the node into the `SHAMap`, then +`canonicalize` makes sure that the first node in wins and the slower thread +receives back a pointer to the node inserted by the faster thread. Recall +that these two `SHAMap`s will share the same `TreeNodeCache`. +## TreeNodeCache ## + +The `TreeNodeCache` is a `std::unordered_map` keyed on the hash of the +`SHAMap` node. The stored type consists of `shared_ptr`, +`weak_ptr`, and a time point indicating the most recent +access of this node in the cache. The time point is based on +`std::chrono::steady_clock`. + +The container uses a cryptographically secure hash that is randomly seeded. + +The `TreeNodeCache` also carries with it various data used for statistics +and logging, and a target age for the contained nodes. When the target age +for a node is exceeded, and there are no more references to the node, the +node is removed from the `TreeNodeCache`. + +## FullBelowCache ## + +This cache remembers which trie keys have all of their children resident in a +`SHAMap`. This optimizes the process of acquiring a complete trie. This is used +when creating the missing nodes list. Missing nodes are those nodes that a +`SHAMap` refers to but that are not stored in the local database. + +As a depth-first walk of a `SHAMap` is performed, if an inner node answers true to +`isFullBelow()` then it is known that none of this node's children are missing +nodes, and thus that subtree does not need to be walked. These nodes are stored +in the FullBelowCache. Subsequent walks check the FullBelowCache first when +encountering a node, and ignore that subtree if found. + +## SHAMapAbstractNode ## + +This is a base class for the two concrete node types. It holds the following +common data: + +1. A node type, one of: + a. error + b. inner + c. transaction with no metadata + d. transaction with metadata + e. account state +2. A hash +3. A sequence number + + +## SHAMapInnerNode ## + +SHAMapInnerNode publicly inherits directly from SHAMapAbstractNode. It holds +the following data: + +1. Up to 16 child nodes, each held with a shared_ptr. +2. A hash for each child. +3. A 16-bit bitset with a 1 bit set for each child that exists. +4. Flag to aid online delete and consistency with data on disk. + +## SHAMapTreeNode ## + +SHAMapTreeNode publicly inherits directly from SHAMapAbstractNode. It holds the +following data: + +1. A shared_ptr to a const SHAMapItem. + +## SHAMapItem ## + +This holds the following data: + +1. uint256. The hash of the data. +2. vector. The data (transactions, account info). ## SHAMap Improvements ## -Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not -used and could be removed. +Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not used +and could be removed. -Here's a more important change. The tree structure is currently embedded -in the SHAMapTreeNodes themselves. It doesn't have to be that way, and -that should be fixed. +Here's a more important change. The trie structure is currently embedded in the +SHAMapTreeNodes themselves. It doesn't have to be that way, and that should be +fixed. -When we navigate the tree (say, like `SHAMap::walkTo()`) we currently -ask each node for information that we could determine locally. We know -the depth because we know how many nodes we have traversed. We know the -ID that we need because that's how we're steering. So we don't need to -store the ID in the node. The next refactor should remove all calls to -`SHAMapTreeNode::GetID()`. +When we navigate the trie (say, like `SHAMap::walkTo()`) we currently ask each +node for information that we could determine locally. We know the depth because +we know how many nodes we have traversed. We know the ID that we need because +that's how we're steering. So we don't need to store the ID in the node. The +next refactor should remove all calls to `SHAMapTreeNode::GetID()`. Then we can remove the NodeID member from SHAMapTreeNode. -Then we can change the SHAMap::mTNBtID member to be mTNByHash. +Then we can change the `SHAMap::mTNBtID` member to be `mTNByHash`. An additional possible refactor would be to have a base type, SHAMapTreeNode, -and derive from that InnerNode and LeafNode types. That would remove -some storage (the array of 16 hashes) from the LeafNodes. That refactor -would also have the effect of simplifying methods like `isLeaf()` and -`hasItem()`. +and derive from that InnerNode and LeafNode types. That would remove some +storage (the array of 16 hashes) from the LeafNodes. That refactor would also +have the effect of simplifying methods like `isLeaf()` and `hasItem()`.