From f4c56cbd53b8352a4ec4633102af08019be959a3 Mon Sep 17 00:00:00 2001
From: Howard Hinnant <howard.hinnant@gmail.com>
Date: Mon, 2 Mar 2020 17:24:35 -0500
Subject: [PATCH] Update SHAMap Documentation

---
 src/ripple/shamap/README.md | 408 +++++++++++++++++++++++++-----------
 1 file changed, 284 insertions(+), 124 deletions(-)

diff --git a/src/ripple/shamap/README.md b/src/ripple/shamap/README.md
index e28d3bbfc3..a04a63c5f5 100644
--- a/src/ripple/shamap/README.md
+++ b/src/ripple/shamap/README.md
@@ -1,189 +1,349 @@
 # SHAMap Introduction #
 
-July 2014
+March 2020
 
-The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
-The SHAMap is also a radix tree of radix 16
+The `SHAMap` is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
+The `SHAMap` is also a radix trie of radix 16
 (http://en.wikipedia.org/wiki/Radix_tree).
 
-*We need some kind of sensible summary of the SHAMap here.*
+The Merkle trie data structure is important because subtrees and even the entire
+tree can be compared with other trees in O(1) time by simply comparing the hashes.
+This makes it very efficient to determine if two `SHAMap`s contain the same set of
+transactions or account state modifications.
 
-A given SHAMap always stores only one of three kinds of data:
+The radix trie property is helpful in that a key (hash) of a transaction
+or account state can be used to navigate the trie.
+
+A `SHAMap` is a trie with two node types:
+
+1.  SHAMapInnerNode
+2.  SHAMapTreeNode
+
+Both of these nodes directly inherit from SHAMapAbstractNode which holds data
+common to both of the node types.
+
+All non-leaf nodes have type SHAMapInnerNode.
+
+All leaf nodes have type SHAMapTreeNode.
+
+The root node is always a SHAMapInnerNode.
+
+A given `SHAMap` always stores only one of three kinds of data:
 
  * Transactions with metadata
  * Transactions without metadata, or
  * Account states.
 
-So all of the leaf nodes of a particular SHAMap will always have a uniform
-type.  The inner nodes carry no data other than the hash of the nodes
-beneath them.
+So all of the leaf nodes of a particular `SHAMap` will always have a uniform type.
+The inner nodes carry no data other than the hash of the nodes beneath them.
 
+All nodes are owned by shared_ptrs resident in either other nodes, or in case of
+the root node, a shared_ptr in the `SHAMap` itself.  The use of shared_ptrs
+permits more than one `SHAMap` at a time to share ownership of a node.  This
+occurs (for example), when a copy of a `SHAMap` is made.
 
-## SHAMap Types ##
+Copies are made with the `snapShot` function as opposed to the `SHAMap` copy
+constructor.  See the section on `SHAMap` creation for more details about
+`snapShot`.
 
-There are two different ways of building and using a SHAMap:
+Sequence numbers are used to further customize the node ownership strategy. See
+the section on sequence numbers for details on sequence numbers.
 
- 1. A mutable SHAMap and
- 2. An immutable SHAMap
+![node diagram](https://user-images.githubusercontent.com/46455409/77350005-1ef12c80-6cf9-11ea-9c8d-56410f442859.png)
 
-The distinction here is not of the classic C++ immutable-means-unchanging
-sense.  An immutable SHAMap contains *nodes* that are immutable.  Also,
-once a node has been located in an immutable SHAMap, that node is
-guaranteed to persist in that SHAMap for the lifetime of the SHAMap.
+## Mutability ##
 
-So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes
-are introduced.  But an immutable SHAMap will never get smaller (until it
-entirely evaporates when it is destroyed).  Nodes, once introduced to the
-immutable SHAMap, also never change their location in memory.  So nodes in
-an immutable SHAMap can be handled using raw pointers (if you're careful).
+There are two different ways of building and using a `SHAMap`:
 
-One consequence of this design is that an immutable SHAMap can never be
-"trimmed".  There is no way to identify unnecessary nodes in an immutable
-SHAMap that could be removed.  Once a node has been brought into the
-in-memory SHAMap, that node stays in memory for the life of the SHAMap.
+ 1. A mutable `SHAMap` and
+ 2. An immutable `SHAMap`
 
-Most SHAMaps are immutable, in the sense that they don't modify or remove
-their contained nodes.
+The distinction here is not of the classic C++ immutable-means-unchanging sense.
+ An immutable `SHAMap` contains *nodes* that are immutable.  Also, once a node has
+been located in an immutable `SHAMap`, that node is guaranteed to persist in that
+`SHAMap` for the lifetime of the `SHAMap`.
 
-An example where a mutable SHAMap is required is when we want to apply
-transactions to the last closed ledger.  To do so we'd make a mutable
-snapshot of the state tree and then start applying transactions to it.
-Because the snapshot is mutable, changes to nodes in the snapshot will not
-affect nodes in other SHAMAps.
+So, somewhat counter-intuitively, an immutable `SHAMap` may grow as new nodes are
+introduced.  But an immutable `SHAMap` will never get smaller (until it entirely
+evaporates when it is destroyed).  Nodes, once introduced to the immutable
+`SHAMap`, also never change their location in memory.  So nodes in an immutable
+`SHAMap` can be handled using raw pointers (if you're careful).
 
-An example using a immutable ledger would be when there's an open ledger
-and some piece of code wishes to query the state of the ledger.  In this
-case we don't wish to change the state of the SHAMap, so we'd use an
-immutable snapshot.
+One consequence of this design is that an immutable `SHAMap` can never be
+"trimmed".  There is no way to identify unnecessary nodes in an immutable `SHAMap`
+that could be removed.  Once a node has been brought into the in-memory `SHAMap`,
+that node stays in memory for the life of the `SHAMap`.
 
+Most `SHAMap`s are immutable, in the sense that they don't modify or remove their
+contained nodes.
+
+An example where a mutable `SHAMap` is required is when we want to apply
+transactions to the last closed ledger.  To do so we'd make a mutable snapshot
+of the state trie and then start applying transactions to it. Because the
+snapshot is mutable, changes to nodes in the snapshot will not affect nodes in
+other `SHAMap`s.
+
+An example using a immutable ledger would be when there's an open ledger and
+some piece of code wishes to query the state of the ledger.  In this case we
+don't wish to change the state of the `SHAMap`, so we'd use an immutable snapshot.
+
+## Sequence numbers ##
+
+Both `SHAMap`s and their nodes carry a sequence number.  This is simply an
+unsigned number that indicates ownership or membership, or a non-membership.
+
+`SHAMap`s sequence numbers normally start out as 1.  However when a snap-shot of
+a `SHAMap` is made, the copy's sequence number is 1 greater than the original.
+
+The nodes of a `SHAMap` have their own copy of a sequence number.  If the `SHAMap`
+is mutable, meaning it can change, then all of its nodes must have the
+same sequence number as the `SHAMap` itself.  This enforces an invariant that none
+of the nodes are shared with other `SHAMap`s.
+
+When a `SHAMap` needs to have a private copy of a node, not shared by any other
+`SHAMap`, it first clones it and then sets the new copy to have a sequence number
+equal to the `SHAMap` sequence number.  The `unshareNode` is a private utility
+which automates the task of first checking if the node is already sharable, and
+if so, cloning it and giving it the proper sequence number.  An example case
+where a private copy is needed is when an inner node needs to have a child
+pointer altered.  Any modification to a node will require a non-shared node.
+
+When a `SHAMap` decides that it is safe to share a node of its own, it sets the
+node's sequence number to 0 (a `SHAMap` never has a sequence number of 0). This
+is done for every node in the trie when `SHAMap::walkSubTree` is executed.
+
+Note that other objects in rippled also have sequence numbers (e.g. ledgers).
+The `SHAMap` and node sequence numbers should not be confused with these other
+sequence numbers (no relation).
 
 ## SHAMap Creation ##
 
-A SHAMap is usually not created from vacuum.  Once an initial SHAMap is
-constructed, later SHAMaps are usually created by calling
-snapShot(bool isMutable) on the original SHAMap().  The returned SHAMap
-has the expected characteristics (mutable or immutable) based on the passed
-in flag.
+A `SHAMap` is usually not created from vacuum.  Once an initial `SHAMap` is
+constructed, later `SHAMap`s are usually created by calling snapShot(bool
+isMutable) on the original `SHAMap`.  The returned `SHAMap` has the expected
+characteristics (mutable or immutable) based on the passed in flag.
 
-It is cheaper to make an immutable snapshot of a SHAMap than to make a mutable
-snapshot.  If the SHAMap snapshot is mutable then any of the nodes that might
-be modified must be copied before they are placed in the mutable map.
+It is cheaper to make an immutable snapshot of a `SHAMap` than to make a mutable
+snapshot.  If the `SHAMap` snapshot is mutable then sharable nodes must be
+copied before they are placed in the mutable map.
 
+A new `SHAMap` is created with each new ledger round.  Transactions not executed
+in the previous ledger populate the `SHAMap` for the new ledger.
 
-## SHAMap Thread Safety ##
+## Storing SHAMap data in the database ##
 
-*This description is obsolete and needs to be rewritten.*
+When consensus is reached, the ledger is closed.  As part of this process, the
+`SHAMap` is stored to the database by calling `SHAMap::flushDirty`.
 
-SHAMaps can be thread safe, depending on how they are used.  The SHAMap
-uses a SyncUnorderedMap for its storage.  The SyncUnorderedMap has three
-thread-safe methods:
+Both `unshare()` and `flushDirty` walk the `SHAMap` by calling
+`SHAMap::walkSubTree`.  As `unshare()` walks the trie, nodes are not written to
+the database, and as `flushDirty` walks the trie nodes are written to the
+database. `walkSubTree` visits every node in the trie. This process must ensure
+that each node is only owned by this trie, and so "unshares" as it walks each
+node (from the root down).  This is done in the `preFlushNode` function by
+ensuring that the node has a sequence number equal to that of the `SHAMap`.  If
+the node doesn't, it is cloned.
 
- * size(),
- * canonicalize(), and
- * retrieve()
+For each inner node encountered (starting with the root node), each of the
+children are inspected (from 1 to 16).  For each child, if it has a non-zero
+sequence number (unshareable), the child is first copied.  Then if the child is
+an inner node, we recurse down to that node's children.  Otherwise we've found a
+leaf node and that node is written to the database.  A count of each leaf node
+that is visited is kept.  The hash of the data in the leaf node is computed at
+this time, and the child is reassigned back into the parent inner node just in
+case the COW operation created a new pointer to this leaf node.
 
-As long as the SHAMap uses only those three interfaces to its storage
-(the mTNByID variable [which stands for Tree Node by ID]) the SHAMap is
-thread safe.
+After processing each node, the node is then marked as sharable again by setting
+its sequence number to 0.
 
+After all of an inner node's children are processed, then its hash is updated
+and the inner node is written to the database.  Then this inner node is assigned
+back into it's parent node, again in case the COW operation created a new
+pointer to it.
 
 ## Walking a SHAMap ##
 
-*We need a good description of why someone would walk a SHAMap and*
-*how it works in the code*
+The private function `SHAMap::walkTowardsKey` is a good example of *how* to walk
+a `SHAMap`, and the various functions that call `walkTowardsKey` are good examples
+of *why* one would want to walk a `SHAMap` (e.g. `SHAMap::findKey`).
+`walkTowardsKey` always starts at the root of the `SHAMap` and traverses down
+through the inner nodes, looking for a leaf node along a path in the trie
+designated by a `uint256`.
 
+As one walks the trie, one can *optionally* keep a stack of nodes that one has
+passed through.  This isn't necessary for walking the trie, but many clients
+will use the stack after finding the desired node.  For example if one is
+deleting a node from the trie, the stack is handy for repairing invariants in
+the trie after the deletion.
+
+To assist in walking the trie, `SHAMap::walkTowardsKey` uses a `SHAMapNodeID`
+that identifies a node by its path from the root and its depth in the trie. The
+path is just a "list" of numbers, each in the range [0 .. 15], depicting which
+child was chosen at each node starting from the root. Each choice is represented
+by 4 bits, and then packed in sequence into a `uint256` (such that the longest
+path possible has 256 / 4 = 64 steps). The high 4 bits of the first byte
+identify which child of the root is chosen, the lower 4 bits of the first byte
+identify the child of that node, and so on. The `SHAMapNodeID` identifying the
+root node has an ID of 0 and a depth of 0. See `SHAMapNodeID::selectBranch` for
+details of how a `SHAMapNodeID` selects a "branch" (child) by indexing into its
+path with its depth.
+
+While the current node is an inner node, traversing down the trie from the root
+continues, unless the path indicates a child that does not exist.  And in this
+case, `nullptr` is returned to indicate no leaf node along the given path
+exists.  Otherwise a leaf node is found and a (non-owning) pointer to it is
+returned.  At each step, if a stack is requested, a
+`pair<shared_ptr<SHAMapAbstractNode>, SHAMapNodeID>` is pushed onto the stack.
+
+When a child node is found by `selectBranch`, the traversal to that node
+consists of two steps:
+
+1.  Update the `shared_ptr` to the current node.
+2.  Update the `SHAMapNodeID`.
+
+The first step consists of several attempts to find the node in various places:
+
+1.  In the trie itself.
+2.  In the node cache.
+3.  In the database.
+
+If the node is not found in the trie, then it is installed into the trie as part
+of the traversal process.
 
 ## Late-arriving Nodes ##
 
-As we noted earlier, SHAMaps (even immutable ones) may grow.  If a SHAMap
-is searching for a node and runs into an empty spot in the tree, then the
-SHAMap looks to see if the node exists but has not yet been made part of
-the map.  This operation is performed in the `SHAMap::fetchNodeExternalNT()`
-method.  The *NT* is this case stands for 'No Throw'.
+As we noted earlier, `SHAMap`s (even immutable ones) may grow.  If a `SHAMap` is
+searching for a node and runs into an empty spot in the trie, then the `SHAMap`
+looks to see if the node exists but has not yet been made part of the map.  This
+operation is performed in the `SHAMap::fetchNodeNT()` method.  The *NT*
+is this case stands for 'No Throw'.
 
-The `fetchNodeExternalNT()` method goes through three phases:
+The `fetchNodeNT()` method goes through three phases:
 
  1. By calling `getCache()` we attempt to locate the missing node in the
-    TreeNodeCache.  The TreeNodeCache is a cache of immutable
-    SHAMapTreeNodes that are shared across all SHAMaps.
+    TreeNodeCache.  The TreeNodeCache is a cache of immutable SHAMapTreeNodes
+    that are shared across all `SHAMap`s.
 
-    Any SHAMapTreeNode that is immutable has a sequence number of zero.
-    When a mutable SHAMap is created then its SHAMapTreeNodes are given
-    non-zero sequence numbers.  So the `assert (ret->getSeq() == 0)`
-    simply confirms that the TreeNodeCache indeed gave us an immutable node.
+    Any SHAMapTreeNode that is immutable has a sequence number of zero
+    (sharable). When a mutable `SHAMap` is created then its SHAMapTreeNodes are
+    given non-zero sequence numbers (unsharable).  But all nodes in the
+    TreeNodeCache are immutable, so if one is found here, its sequence number
+    will be 0.
 
  2. If the node is not in the TreeNodeCache, we attempt to locate the node
-    in the historic data stored by the data base.  The call to
-    to `fetch(hash)` does that work for us.
-
- 3. Finally, if ledgerSeq_ is non-zero and we did't locate the node in the
-    historic data, then we call a MissingNodeHandler.
-
-    The non-zero ledgerSeq_ indicates that the SHAMap is a complete map that
-    belongs to a historic ledger with the given (non-zero) sequence number.
-    So, if all expected data is always present, the MissingNodeHandler should
-    never be executed.
-
-    And, since we now know that this SHAMap does not fully represent
-    the data from that ledger, we set the SHAMap's sequence number to zero.
-
-If phase 1 returned a node, then we already know that the node is immutable.
-However, if either phase 2 executes successfully, then we need to turn the
-returned node into an immutable node.  That's handled by the call to
-`make_shared<SHAMapTreeNode>` inside the try block.  That code is inside
-a try block because the `fetchNodeExternalNT` method promises not to throw.
-In case the constructor called by `make_shared` throws we don't want to
-break our promise.
+    in the historic data stored by the data base.  The call to to
+    `fetchNodeFromDB(hash)` does that work for us.
 
+ 3. Finally if a filter exists, we check if it can supply the node.  This is
+    typically the LedgerMaster which tracks the current ledger and ledgers
+    in the process of closing.
 
 ## Canonicalize ##
 
-The calls to `canonicalize()` make sure that if the resulting node is already
-in the SHAMap, then we return the node that's already present -- we never
-replace a pre-existing node.  By using `canonicalize()` we manage a thread
-race condition where two different threads might both recognize the lack of a
-SHAMapTreeNode at the same time.  If they both attempt to insert the node
-then `canonicalize` makes sure that the first node in wins and the slower
-thread receives back a pointer to the node inserted by the faster thread.
+`canonicalize()` is called every time a node is introduced into the `SHAMap`.
 
-There's a problem with the current SHAMap design that `canonicalize()`
-accommodates.  Two different trees can have the exact same node (the same
-hash value) with two different IDs.  If the TreeNodeCache returns a node
-with the same hash but a different ID, then we assume that the ID of the
-passed-in node is 'better' than the older ID in the TreeNodeCache.  So we
-construct a new SHAMapTreeNode by copying the one we found in the
-TreeNodeCache, but we give the new node the new ID.  Then we replace the
-SHAMapTreeNode in the TreeNodeCache with this newly constructed node.
+A call to `canonicalize()` stores the node in the `TreeNodeCache` if it does not
+already exist in the `TreeNodeCache`.
 
-The TreeNodeCache is not subject to the rule that any node must be
-resident forever.  So it's okay to replace the old node with the new node.
+The calls to `canonicalize()` make sure that if the resulting node is already in
+the `SHAMap`, node `TreeNodeCache` or database, then we don't create duplicates
+by favoring the copy already in the `TreeNodeCache`.
 
-The `SHAMap::getCache()` method exhibits the same behavior.
+By using `canonicalize()` we manage a thread race condition where two different
+threads might both recognize the lack of a SHAMapTreeNode at the same time
+(during a fetch).  If they both attempt to insert the node into the `SHAMap`, then
+`canonicalize` makes sure that the first node in wins and the slower thread
+receives back a pointer to the node inserted by the faster thread.  Recall
+that these two `SHAMap`s will share the same `TreeNodeCache`.
 
+## TreeNodeCache ##
+
+The `TreeNodeCache` is a `std::unordered_map` keyed on the hash of the
+`SHAMap` node.  The stored type consists of `shared_ptr<SHAMapAbstractNode>`,
+`weak_ptr<SHAMapAbstractNode>`, and a time point indicating the most recent
+access of this node in the cache.  The time point is based on
+`std::chrono::steady_clock`.
+
+The container uses a cryptographically secure hash that is randomly seeded.
+
+The `TreeNodeCache` also carries with it various data used for statistics
+and logging, and a target age for the contained nodes.  When the target age
+for a node is exceeded, and there are no more references to the node, the
+node is removed from the `TreeNodeCache`.
+
+## FullBelowCache ##
+
+This cache remembers which trie keys have all of their children resident in a
+`SHAMap`.  This optimizes the process of acquiring a complete trie.  This is used
+when creating the missing nodes list.  Missing nodes are those nodes that a
+`SHAMap` refers to but that are not stored in the local database.
+
+As a depth-first walk of a `SHAMap` is performed, if an inner node answers true to
+`isFullBelow()` then it is known that none of this node's children are missing
+nodes, and thus that subtree does not need to be walked.  These nodes are stored
+in the FullBelowCache.  Subsequent walks check the FullBelowCache first when
+encountering a node, and ignore that subtree if found.
+
+## SHAMapAbstractNode ##
+
+This is a base class for the two concrete node types.  It holds the following
+common data:
+
+1.  A node type, one of:
+    a.  error
+    b.  inner
+    c.  transaction with no metadata
+    d.  transaction with metadata
+    e.  account state
+2.  A hash
+3.  A sequence number
+
+
+## SHAMapInnerNode ##
+
+SHAMapInnerNode publicly inherits directly from SHAMapAbstractNode.  It holds
+the following data:
+
+1.  Up to 16 child nodes, each held with a shared_ptr.
+2.  A hash for each child.
+3.  A 16-bit bitset with a 1 bit set for each child that exists.
+4.  Flag to aid online delete and consistency with data on disk.
+
+## SHAMapTreeNode ##
+
+SHAMapTreeNode publicly inherits directly from SHAMapAbstractNode.  It holds the
+following data:
+
+1.  A shared_ptr to a const SHAMapItem.
+
+## SHAMapItem ##
+
+This holds the following data:
+
+1.  uint256.  The hash of the data.
+2.  vector<unsigned char>.  The data (transactions, account info).
 
 ## SHAMap Improvements ##
 
-Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not
-used and could be removed.
+Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not used
+and could be removed.
 
-Here's a more important change.  The tree structure is currently embedded
-in the SHAMapTreeNodes themselves.  It doesn't have to be that way, and
-that should be fixed.
+Here's a more important change.  The trie structure is currently embedded in the
+SHAMapTreeNodes themselves.  It doesn't have to be that way, and that should be
+fixed.
 
-When we navigate the tree (say, like `SHAMap::walkTo()`) we currently
-ask each node for information that we could determine locally.  We know
-the depth because we know how many nodes we have traversed.  We know the
-ID that we need because that's how we're steering.  So we don't need to
-store the ID in the node.  The next refactor should remove all calls to
-`SHAMapTreeNode::GetID()`.
+When we navigate the trie (say, like `SHAMap::walkTo()`) we currently ask each
+node for information that we could determine locally.  We know the depth because
+we know how many nodes we have traversed.  We know the ID that we need because
+that's how we're steering.  So we don't need to store the ID in the node.  The
+next refactor should remove all calls to `SHAMapTreeNode::GetID()`.
 
 Then we can remove the NodeID member from SHAMapTreeNode.
 
-Then we can change the SHAMap::mTNBtID  member to be mTNByHash.
+Then we can change the `SHAMap::mTNBtID`  member to be `mTNByHash`.
 
 An additional possible refactor would be to have a base type, SHAMapTreeNode,
-and derive from that InnerNode and LeafNode types.  That would remove
-some storage (the array of 16 hashes) from the LeafNodes.  That refactor
-would also have the effect of simplifying methods like `isLeaf()` and
-`hasItem()`.
+and derive from that InnerNode and LeafNode types.  That would remove some
+storage (the array of 16 hashes) from the LeafNodes.  That refactor would also
+have the effect of simplifying methods like `isLeaf()` and `hasItem()`.