SHAMap Documentation

2025-12-06 17:27:52 +00:00 · 2014-07-03 15:54:12 -07:00
parent 322af30d6a
commit baaa45f8c7
4 changed files with 223 additions and 18 deletions
--- a/doc/rippled-example.cfg
+++ b/doc/rippled-example.cfg
@@ -27,12 +27,12 @@
 # Purpose
 #
 #   This file documents and provides examples of all rippled server process
-#   configuration options. When the rippled server instance is lanched, it looks
+#   configuration options. When the rippled server instance is launched, it
-#   for a file with the following name:
+#   looks for a file with the following name:
 #
 #     rippled.cfg
 #
-#   For more information on where the rippled serer instance searches for
+#   For more information on where the rippled server instance searches for
 #   the file please visit the Ripple wiki. Specifically, the section explaining
 #   the --conf command line option:
 #
@@ -242,7 +242,7 @@
 #
 #   The amount of time to wait in seconds, before sending a websocket 'ping'
 #   message. Ping messages are used to determine if the remote end of the
-#   connection is no longer availabile.
+#   connection is no longer available.
 #   
 #
 #
@@ -299,7 +299,7 @@
 #
 #   This group of settings configures security and access attributes of the
 #   RPC server section of the rippled process, used to service both local
-#   an optional remote clients.
+#   and optional remote clients.
 #
 #
 #
@@ -314,7 +314,7 @@
 #
 # [rpc_admin_allow]
 #
-#   Specify an list of IP addresses allowed to have admin access. One per line.
+#   Specify a list of IP addresses allowed to have admin access. One per line.
 #   If you want to test the output of non-admin commands add this section and
 #   just put an ip address not under your control.
 #   Defaults to 127.0.0.1.
@@ -334,7 +334,7 @@
 #
 # [rpc_admin_password]
 #
-#   As a server, require this as the admin pasword to be specified.  Also,
+#   As a server, require this as the admin password to be specified.  Also,
 #   require rpc_admin_user and rpc_admin_password to be checked for RPC admin
 #   functions.  The request must specify these as the admin_user and
 #   admin_password in the request object.
@@ -358,7 +358,7 @@
 #
 # [rpc_user]
 #
-#   As a server, require a this user to specified and require rpc_password to
+#   As a server, require this user to be specified and require rpc_password to
 #   be checked for RPC access via the rpc_ip and rpc_port. The user and password
 #   must be specified via HTTP's basic authentication method.
 #   As a client, supply this to the server via HTTP's basic authentication
@@ -368,7 +368,7 @@
 #
 # [rpc_password]
 #
-#   As a server, require a this password to specified and require rpc_user to
+#   As a server, require this password to be specified and require rpc_user to
 #   be checked for RPC access via the rpc_ip and rpc_port. The user and password
 #   must be specified via HTTP's basic authentication method.
 #   As a client, supply this to the server via HTTP's basic authentication
@@ -393,8 +393,9 @@
 #   0: Server certificates are not provided for RPC clients using SSL [default]
 #   1: Client RPC connections wil be provided with SSL certificates.
 #
-#   Note that if rpc_secure is enabled, it will also be necessasry to configure the
+#   Note that if rpc_secure is enabled, it will also be necessary to configure
-#   certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and rpc_ssl_key
+#   the certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and
 #   rpc_ssl_key
 #
 #
 #
@@ -402,8 +403,9 @@
 #
 #   <pathname>
 #
-#   A file system path leading to the SSL certificate file to use for secure RPC.
+#   A file system path leading to the SSL certificate file to use for secure
-#   The file is in PEM format. The file is not needed if the chain includes it.
+#   RPC.  The file is in PEM format. The file is not needed if the chain
 #   includes it.
 #
 #
 #
@@ -444,7 +446,7 @@
 #
 #     [sms_url]?from=[sms_from]&to=[sms_to]&api_key=[sms_key]&api_secret=[sms_secret]&text=['text']
 #
-#   Where [...] are the corresponding valus from the configuration file, and
+#   Where [...] are the corresponding values from the configuration file, and
 #   ['test'] is the value of the JSON field with name 'text'.
 #
 # [sms_url]
@@ -538,8 +540,8 @@
 #   For domains, rippled will probe for https web servers at the specified
 #   domain in the following order: ripple.DOMAIN, www.DOMAIN, DOMAIN
 #
-#   For public key entries, a comment may optionally be spcified after adding a
+#   For public key entries, a comment may optionally be specified after adding
-#   space to the pulic key.
+#   a space to the public key.
 #
 #   Examples:
 #    ripple.com
@@ -587,14 +589,14 @@
 #
 # [path_search_fast]
 # [path_search_max]
-#   When seaching for paths, the minimum and maximum search aggressiveness.
+#   When searching for paths, the minimum and maximum search aggressiveness.
 #
 #   The default for 'path_search_fast' is 2. The default for 'path_search_max' is 10.
 #
 # [path_search_old]
 #
 #   For clients that use the legacy path finding interfaces, the search
-#   agressiveness to use. The default is 7.
+#   agressivness to use. The default is 7.
 #
 #
 #
--- a/src/ripple/module/app/ledger/README.md
+++ b/src/ripple/module/app/ledger/README.md
@@ -94,6 +94,20 @@ of the consensus process.
 A signed statement of which transactions it believes should be included in
 the next consensus ledger.
 ## Ledger Header ##
 The "ledger header" is the chunk of data that hashes to the
 ledger's hash. It contains the sequence number, parent hash,
 hash of the previous ledger, hash of the root node of the
 state tree, and so on.
 ## Ledger Base ##
 The term "ledger base" refers to a particular type of query
 and response used in the ledger fetch process that includes
 the ledger header but may also contain other information
 such as the root node of the state tree.
 ---
 # References
--- a/src/ripple/module/app/shamap/README.md
+++ b/src/ripple/module/app/shamap/README.md
@@ -0,0 +1,187 @@
 # SHAMap Introduction #
 July 2014
 The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
 The SHAMap is also a radix tree of radix 16
 (http://en.wikipedia.org/wiki/Radix_tree).
 *We need some kind of sensible summary of the SHAMap here.*
 A given SHAMap always stores only one of three kinds of data:
 * Transactions with metadata
 * Transactions without metadata, or
 * Account states.
 So all of the leaf nodes of a particular SHAMap will always have a uniform
 type.  The inner nodes carry no data other than the hash of the nodes
 beneath them.
 ## SHAMap Types ##
 There are two different ways of building and using a SHAMap:
 1. A mutable SHAMap and
 2. An immutable SHAMap
 The distinction here is not of the classic C++ immutable-means-unchanging
 sense.  An immutable SHAMap contains *nodes* that are immutable.  Also,
 once a node has been located in an immutable SHAMap, that node is
 guaranteed to persist in that SHAMap for the lifetime of the SHAMap.
 So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes
 are introduced.  But an immutable SHAMap will never get smaller (until it
 entirely evaporates when it is destroyed).  Nodes, once introduced to the
 immutable SHAMap, also never change their location in memory.  So nodes in
 an immutable SHAMap can be handled using raw pointers (if you're careful).
 One consequence of this design is that an immutable SHAMap can never be
 "trimmed".  There is no way to identify unnecessary nodes in an immutable
 SHAMap that could be removed.  Once a node has been brought into the
 in-memory SHAMap, that node stays in memory for the life of the SHAMap.
 Most SHAMaps are immutable, in the sense that they don't modify or remove
 their contained nodes.
 An example where a mutable SHAMap is required is when we want to apply
 transactions to the last closed ledger.  To do so we'd make a mutable
 snapshot of the state tree and then start applying transactions to it.
 Because the snapshot is mutable, changes to nodes in the snapshot will not
 affect nodes in other SHAMAps.
 An example using a immutable ledger would be when there's an open ledger
 and some piece of code wishes to query the state of the ledger.  In this
 case we don't wish to change the state of the SHAMap, so we'd use an
 immutable snapshot.
 ## SHAMap Creation ##
 A SHAMap is usually not created from vacuum.  Once an initial SHAMap is
 constructed, later SHAMaps are usually created by calling
 snapShot(bool isMutable) on the original SHAMap().  The returned SHAMap
 has the expected characteristics (mutable or immutable) based on the passed
 in flag.
 It is cheaper to make an immutable snapshot of a SHAMap than to make a mutable
 snapshot.  If the SHAMap snapshot is mutable then any of the nodes that might
 be modified must be copied before they are placed in the mutable map.
 ## SHAMap Thread Safety ##
 SHAMaps can be thread safe, depending on how they are used.  The SHAMap
 uses a SyncUnorderedMap for its storage.  The SyncUnorderedMap has three
 thread-safe methods:
 * size(),
 * canonicalize(), and
 * retrieve()
 As long as the SHAMap uses only those three interfaces to its storage
 (the mTNByID variable [which stands for Tree Node by ID]) the SHAMap is
 thread safe.
 ## Walking a SHAMap ##
 *We need a good description of why someone would walk a SHAMap and*
 *how it works in the code*
 ## Late-arriving Nodes ##
 As we noted earlier, SHAMaps (even immutable ones) may grow.  If a SHAMap
 is searching for a node and runs into an empty spot in the tree, then the
 SHAMap looks to see if the node exists but has not yet been made part of
 the map.  This operation is performed in the `SHAMap::fetchNodeExternalNT()`
 method.  The *NT* is this case stands for 'No Throw'.
 The `fetchNodeExternalNT()` method goes through three phases:
 1. By calling `getCache()` we attempt to locate the missing node in the
    TreeNodeCache.  The TreeNodeCache is a cache of immutable
    SHAMapTreeNodes that are shared across all SHAMaps.
    Any SHAMapTreeNode that is immutable has a sequence number of zero.
    When a mutable SHAMap is created then its SHAMapTreeNodes are given
    non-zero sequence numbers.  So the `assert (ret->getSeq() == 0)`
    simply confirms that the TreeNodeCache indeed gave us an immutable node.
 2. If the node is not in the TreeNodeCache, we attempt to locate the node
    in the historic data stored by the data base.  The call to
    to `getApp().getNodeStore().fetch(hash)` does that work for us.
 3. Finally, if mLedgerSeq is non-zero and we did't locate the node in the
    historic data, then we call a MissingNodeHandler.
    The non-zero mLedgerSeq indicates that the SHAMap is a complete map that
    belongs to a historic ledger with the given (non-zero) sequence number.
    So, if all expected data is always present, the MissingNodeHandler should
    never be executed.
    And, since we now know that this SHAMap does not fully represent
    the data from that ledger, we set the SHAMap's sequence number to zero.
 If phase 1 returned a node, then we already know that the node is immutable.
 However, if either phase 2 executes successfully, then we need to turn the
 returned node into an immutable node.  That's handled by the call to
 `make_shared<SHAMapTreeNode>` inside the try block.  That code is inside
 a try block because the `fetchNodeExternalNT` method promises not to throw.
 In case the constructor called by `make_shared` throws we don't want to
 break our promise.
 ## Canonicalize ##
 The calls to `canonicalize()` make sure that if the resulting node is already
 in the SHAMap, then we return the node that's already present -- we never
 replace a pre-existing node.  By using `canonicalize()` we manage a thread
 race condition where two different threads might both recognize the lack of a
 SHAMapTreeNode at the same time.  If they both attempt to insert the node
 then `canonicalize` makes sure that the first node in wins and the slower
 thread receives back a pointer to the node inserted by the faster thread.
 There's a problem with the current SHAMap design that `canonicalize()`
 accommodates.  Two different trees can have the exact same node (the same
 hash value) with two different IDs.  If the TreeNodeCache returns a node
 with the same hash but a different ID, then we assume that the ID of the
 passed-in node is 'better' than the older ID in the TreeNodeCache.  So we
 construct a new SHAMapTreeNode by copying the one we found in the
 TreeNodeCache, but we give the new node the new ID.  Then we replace the
 SHAMapTreeNode in the TreeNodeCache with this newly constructed node.
 The TreeNodeCache is not subject to the rule that any node must be
 resident forever.  So it's okay to replace the old node with the new node.
 The `SHAMap::getCache()` method exhibits the same behavior.
 ## SHAMap Improvements ##
 Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not
 used and could be removed.
 Here's a more important change.  The tree structure is currently embedded
 in the SHAMapTreeNodes themselves.  It doesn't have to be that way, and
 that should be fixed.
 When we navigate the tree (say, like `SHAMap::walkTo()`) we currently
 ask each node for information that we could determine locally.  We know
 the depth because we know how many nodes we have traversed.  We know the
 ID that we need because that's how we're steering.  So we don't need to
 store the ID in the node.  The next refactor should remove all calls to
 `SHAMapTreeNode::GetID()`.
 Then we can remove the NodeID member from SHAMapTreeNode.
 Then we can change the SHAMap::mTNBtID  member to be mTNByHash.
 An additional possible refactor would be to have a base type, SHAMapTreeNode,
 and derive from that InnerNode and LeafNode types.  That would remove
 some storage (the array of 16 hashes) from the LeafNodes.  That refactor
 would also have the effect of simplifying methods like `isLeaf()` and
 `hasItem()`.
--- a/src/ripple/module/app/shamap/SHAMap.cpp
+++ b/src/ripple/module/app/shamap/SHAMap.cpp
@@ -1004,6 +1004,8 @@ SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT (const SHAMapNodeID& id, uin
 {
    SHAMapTreeNode::pointer ret;
    // This if allows us to use the SHAMap in unit tests.  So we don't attempt
    // to fetch external nodes if we're not running in the application.
    if (!getApp().running ())
        return ret;