SHAMap Documentation

2025-12-06 17:27:52 +00:00 · 2014-07-03 15:54:12 -07:00
parent 322af30d6a
commit baaa45f8c7
4 changed files with 223 additions and 18 deletions
--- a/doc/rippled-example.cfg
+++ b/doc/rippled-example.cfg
@@ -27,12 +27,12 @@
 # Purpose
 #
 #   This file documents and provides examples of all rippled server process
-#   configuration options. When the rippled server instance is lanched, it looks
-#   for a file with the following name:
+#   configuration options. When the rippled server instance is launched, it
+#   looks for a file with the following name:
 #
 #     rippled.cfg
 #
-#   For more information on where the rippled serer instance searches for
+#   For more information on where the rippled server instance searches for
 #   the file please visit the Ripple wiki. Specifically, the section explaining
 #   the --conf command line option:
 #
@@ -242,7 +242,7 @@
 #
 #   The amount of time to wait in seconds, before sending a websocket 'ping'
 #   message. Ping messages are used to determine if the remote end of the
-#   connection is no longer availabile.
+#   connection is no longer available.
 #   
 #
 #
@@ -299,7 +299,7 @@
 #
 #   This group of settings configures security and access attributes of the
 #   RPC server section of the rippled process, used to service both local
-#   an optional remote clients.
+#   and optional remote clients.
 #
 #
 #
@@ -314,7 +314,7 @@
 #
 # [rpc_admin_allow]
 #
-#   Specify an list of IP addresses allowed to have admin access. One per line.
+#   Specify a list of IP addresses allowed to have admin access. One per line.
 #   If you want to test the output of non-admin commands add this section and
 #   just put an ip address not under your control.
 #   Defaults to 127.0.0.1.
@@ -334,7 +334,7 @@
 #
 # [rpc_admin_password]
 #
-#   As a server, require this as the admin pasword to be specified.  Also,
+#   As a server, require this as the admin password to be specified.  Also,
 #   require rpc_admin_user and rpc_admin_password to be checked for RPC admin
 #   functions.  The request must specify these as the admin_user and
 #   admin_password in the request object.
@@ -358,7 +358,7 @@
 #
 # [rpc_user]
 #
-#   As a server, require a this user to specified and require rpc_password to
+#   As a server, require this user to be specified and require rpc_password to
 #   be checked for RPC access via the rpc_ip and rpc_port. The user and password
 #   must be specified via HTTP's basic authentication method.
 #   As a client, supply this to the server via HTTP's basic authentication
@@ -368,7 +368,7 @@
 #
 # [rpc_password]
 #
-#   As a server, require a this password to specified and require rpc_user to
+#   As a server, require this password to be specified and require rpc_user to
 #   be checked for RPC access via the rpc_ip and rpc_port. The user and password
 #   must be specified via HTTP's basic authentication method.
 #   As a client, supply this to the server via HTTP's basic authentication
@@ -393,8 +393,9 @@
 #   0: Server certificates are not provided for RPC clients using SSL [default]
 #   1: Client RPC connections wil be provided with SSL certificates.
 #
-#   Note that if rpc_secure is enabled, it will also be necessasry to configure the
-#   certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and rpc_ssl_key
+#   Note that if rpc_secure is enabled, it will also be necessary to configure
+#   the certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and
+#   rpc_ssl_key
 #
 #
 #
@@ -402,8 +403,9 @@
 #
 #   <pathname>
 #
-#   A file system path leading to the SSL certificate file to use for secure RPC.
-#   The file is in PEM format. The file is not needed if the chain includes it.
+#   A file system path leading to the SSL certificate file to use for secure
+#   RPC.  The file is in PEM format. The file is not needed if the chain
+#   includes it.
 #
 #
 #
@@ -444,7 +446,7 @@
 #
 #     [sms_url]?from=[sms_from]&to=[sms_to]&api_key=[sms_key]&api_secret=[sms_secret]&text=['text']
 #
-#   Where [...] are the corresponding valus from the configuration file, and
+#   Where [...] are the corresponding values from the configuration file, and
 #   ['test'] is the value of the JSON field with name 'text'.
 #
 # [sms_url]
@@ -538,8 +540,8 @@
 #   For domains, rippled will probe for https web servers at the specified
 #   domain in the following order: ripple.DOMAIN, www.DOMAIN, DOMAIN
 #
-#   For public key entries, a comment may optionally be spcified after adding a
-#   space to the pulic key.
+#   For public key entries, a comment may optionally be specified after adding
+#   a space to the public key.
 #
 #   Examples:
 #    ripple.com
@@ -587,14 +589,14 @@
 #
 # [path_search_fast]
 # [path_search_max]
-#   When seaching for paths, the minimum and maximum search aggressiveness.
+#   When searching for paths, the minimum and maximum search aggressiveness.
 #
 #   The default for 'path_search_fast' is 2. The default for 'path_search_max' is 10.
 #
 # [path_search_old]
 #
 #   For clients that use the legacy path finding interfaces, the search
-#   agressiveness to use. The default is 7.
+#   agressivness to use. The default is 7.
 #
 #
 #
--- a/src/ripple/module/app/ledger/README.md
+++ b/src/ripple/module/app/ledger/README.md
@@ -94,6 +94,20 @@ of the consensus process.
 A signed statement of which transactions it believes should be included in
 the next consensus ledger.

+## Ledger Header ##
+
+The "ledger header" is the chunk of data that hashes to the
+ledger's hash. It contains the sequence number, parent hash,
+hash of the previous ledger, hash of the root node of the
+state tree, and so on.
+
+## Ledger Base ##
+
+The term "ledger base" refers to a particular type of query
+and response used in the ledger fetch process that includes
+the ledger header but may also contain other information
+such as the root node of the state tree.
+
 ---

 # References
--- a/src/ripple/module/app/shamap/README.md
+++ b/src/ripple/module/app/shamap/README.md
@@ -0,0 +1,187 @@
+# SHAMap Introduction #
+
+July 2014
+
+The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
+The SHAMap is also a radix tree of radix 16
+(http://en.wikipedia.org/wiki/Radix_tree).
+
+*We need some kind of sensible summary of the SHAMap here.*
+
+A given SHAMap always stores only one of three kinds of data:
+
+ * Transactions with metadata
+ * Transactions without metadata, or
+ * Account states.
+
+So all of the leaf nodes of a particular SHAMap will always have a uniform
+type.  The inner nodes carry no data other than the hash of the nodes
+beneath them.
+
+
+## SHAMap Types ##
+
+There are two different ways of building and using a SHAMap:
+
+ 1. A mutable SHAMap and
+ 2. An immutable SHAMap
+
+The distinction here is not of the classic C++ immutable-means-unchanging
+sense.  An immutable SHAMap contains *nodes* that are immutable.  Also,
+once a node has been located in an immutable SHAMap, that node is
+guaranteed to persist in that SHAMap for the lifetime of the SHAMap.
+
+So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes
+are introduced.  But an immutable SHAMap will never get smaller (until it
+entirely evaporates when it is destroyed).  Nodes, once introduced to the
+immutable SHAMap, also never change their location in memory.  So nodes in
+an immutable SHAMap can be handled using raw pointers (if you're careful).
+
+One consequence of this design is that an immutable SHAMap can never be
+"trimmed".  There is no way to identify unnecessary nodes in an immutable
+SHAMap that could be removed.  Once a node has been brought into the
+in-memory SHAMap, that node stays in memory for the life of the SHAMap.
+
+Most SHAMaps are immutable, in the sense that they don't modify or remove
+their contained nodes.
+
+An example where a mutable SHAMap is required is when we want to apply
+transactions to the last closed ledger.  To do so we'd make a mutable
+snapshot of the state tree and then start applying transactions to it.
+Because the snapshot is mutable, changes to nodes in the snapshot will not
+affect nodes in other SHAMAps.
+
+An example using a immutable ledger would be when there's an open ledger
+and some piece of code wishes to query the state of the ledger.  In this
+case we don't wish to change the state of the SHAMap, so we'd use an
+immutable snapshot.
+
+
+## SHAMap Creation ##
+
+A SHAMap is usually not created from vacuum.  Once an initial SHAMap is
+constructed, later SHAMaps are usually created by calling
+snapShot(bool isMutable) on the original SHAMap().  The returned SHAMap
+has the expected characteristics (mutable or immutable) based on the passed
+in flag.
+
+It is cheaper to make an immutable snapshot of a SHAMap than to make a mutable
+snapshot.  If the SHAMap snapshot is mutable then any of the nodes that might
+be modified must be copied before they are placed in the mutable map.
+
+
+## SHAMap Thread Safety ##
+
+SHAMaps can be thread safe, depending on how they are used.  The SHAMap
+uses a SyncUnorderedMap for its storage.  The SyncUnorderedMap has three
+thread-safe methods:
+
+ * size(),
+ * canonicalize(), and
+ * retrieve()
+
+As long as the SHAMap uses only those three interfaces to its storage
+(the mTNByID variable [which stands for Tree Node by ID]) the SHAMap is
+thread safe.
+
+
+## Walking a SHAMap ##
+
+*We need a good description of why someone would walk a SHAMap and*
+*how it works in the code*
+
+
+## Late-arriving Nodes ##
+
+As we noted earlier, SHAMaps (even immutable ones) may grow.  If a SHAMap
+is searching for a node and runs into an empty spot in the tree, then the
+SHAMap looks to see if the node exists but has not yet been made part of
+the map.  This operation is performed in the `SHAMap::fetchNodeExternalNT()`
+method.  The *NT* is this case stands for 'No Throw'.
+
+The `fetchNodeExternalNT()` method goes through three phases:
+
+ 1. By calling `getCache()` we attempt to locate the missing node in the
+    TreeNodeCache.  The TreeNodeCache is a cache of immutable
+    SHAMapTreeNodes that are shared across all SHAMaps.
+
+    Any SHAMapTreeNode that is immutable has a sequence number of zero.
+    When a mutable SHAMap is created then its SHAMapTreeNodes are given
+    non-zero sequence numbers.  So the `assert (ret->getSeq() == 0)`
+    simply confirms that the TreeNodeCache indeed gave us an immutable node.
+
+ 2. If the node is not in the TreeNodeCache, we attempt to locate the node
+    in the historic data stored by the data base.  The call to
+    to `getApp().getNodeStore().fetch(hash)` does that work for us.
+
+ 3. Finally, if mLedgerSeq is non-zero and we did't locate the node in the
+    historic data, then we call a MissingNodeHandler.
+
+    The non-zero mLedgerSeq indicates that the SHAMap is a complete map that
+    belongs to a historic ledger with the given (non-zero) sequence number.
+    So, if all expected data is always present, the MissingNodeHandler should
+    never be executed.
+
+    And, since we now know that this SHAMap does not fully represent
+    the data from that ledger, we set the SHAMap's sequence number to zero.
+
+If phase 1 returned a node, then we already know that the node is immutable.
+However, if either phase 2 executes successfully, then we need to turn the
+returned node into an immutable node.  That's handled by the call to
+`make_shared<SHAMapTreeNode>` inside the try block.  That code is inside
+a try block because the `fetchNodeExternalNT` method promises not to throw.
+In case the constructor called by `make_shared` throws we don't want to
+break our promise.
+
+
+## Canonicalize ##
+
+The calls to `canonicalize()` make sure that if the resulting node is already
+in the SHAMap, then we return the node that's already present -- we never
+replace a pre-existing node.  By using `canonicalize()` we manage a thread
+race condition where two different threads might both recognize the lack of a
+SHAMapTreeNode at the same time.  If they both attempt to insert the node
+then `canonicalize` makes sure that the first node in wins and the slower
+thread receives back a pointer to the node inserted by the faster thread.
+
+There's a problem with the current SHAMap design that `canonicalize()`
+accommodates.  Two different trees can have the exact same node (the same
+hash value) with two different IDs.  If the TreeNodeCache returns a node
+with the same hash but a different ID, then we assume that the ID of the
+passed-in node is 'better' than the older ID in the TreeNodeCache.  So we
+construct a new SHAMapTreeNode by copying the one we found in the
+TreeNodeCache, but we give the new node the new ID.  Then we replace the
+SHAMapTreeNode in the TreeNodeCache with this newly constructed node.
+
+The TreeNodeCache is not subject to the rule that any node must be
+resident forever.  So it's okay to replace the old node with the new node.
+
+The `SHAMap::getCache()` method exhibits the same behavior.
+
+
+## SHAMap Improvements ##
+
+Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not
+used and could be removed.
+
+Here's a more important change.  The tree structure is currently embedded
+in the SHAMapTreeNodes themselves.  It doesn't have to be that way, and
+that should be fixed.
+
+When we navigate the tree (say, like `SHAMap::walkTo()`) we currently
+ask each node for information that we could determine locally.  We know
+the depth because we know how many nodes we have traversed.  We know the
+ID that we need because that's how we're steering.  So we don't need to
+store the ID in the node.  The next refactor should remove all calls to
+`SHAMapTreeNode::GetID()`.
+
+Then we can remove the NodeID member from SHAMapTreeNode.
+
+Then we can change the SHAMap::mTNBtID  member to be mTNByHash.
+
+An additional possible refactor would be to have a base type, SHAMapTreeNode,
+and derive from that InnerNode and LeafNode types.  That would remove
+some storage (the array of 16 hashes) from the LeafNodes.  That refactor
+would also have the effect of simplifying methods like `isLeaf()` and
+`hasItem()`.
+
--- a/src/ripple/module/app/shamap/SHAMap.cpp
+++ b/src/ripple/module/app/shamap/SHAMap.cpp
@@ -1004,6 +1004,8 @@ SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT (const SHAMapNodeID& id, uin
 {
    SHAMapTreeNode::pointer ret;

+    // This if allows us to use the SHAMap in unit tests.  So we don't attempt
+    // to fetch external nodes if we're not running in the application.
    if (!getApp().running ())
        return ret;