SHAMap Documentation

This commit is contained in:
Scott Schurr
2014-07-03 15:54:12 -07:00
committed by Nik Bougalis
parent 322af30d6a
commit baaa45f8c7
4 changed files with 223 additions and 18 deletions

View File

@@ -27,12 +27,12 @@
# Purpose
#
# This file documents and provides examples of all rippled server process
# configuration options. When the rippled server instance is lanched, it looks
# for a file with the following name:
# configuration options. When the rippled server instance is launched, it
# looks for a file with the following name:
#
# rippled.cfg
#
# For more information on where the rippled serer instance searches for
# For more information on where the rippled server instance searches for
# the file please visit the Ripple wiki. Specifically, the section explaining
# the --conf command line option:
#
@@ -242,7 +242,7 @@
#
# The amount of time to wait in seconds, before sending a websocket 'ping'
# message. Ping messages are used to determine if the remote end of the
# connection is no longer availabile.
# connection is no longer available.
#
#
#
@@ -299,7 +299,7 @@
#
# This group of settings configures security and access attributes of the
# RPC server section of the rippled process, used to service both local
# an optional remote clients.
# and optional remote clients.
#
#
#
@@ -314,7 +314,7 @@
#
# [rpc_admin_allow]
#
# Specify an list of IP addresses allowed to have admin access. One per line.
# Specify a list of IP addresses allowed to have admin access. One per line.
# If you want to test the output of non-admin commands add this section and
# just put an ip address not under your control.
# Defaults to 127.0.0.1.
@@ -334,7 +334,7 @@
#
# [rpc_admin_password]
#
# As a server, require this as the admin pasword to be specified. Also,
# As a server, require this as the admin password to be specified. Also,
# require rpc_admin_user and rpc_admin_password to be checked for RPC admin
# functions. The request must specify these as the admin_user and
# admin_password in the request object.
@@ -358,7 +358,7 @@
#
# [rpc_user]
#
# As a server, require a this user to specified and require rpc_password to
# As a server, require this user to be specified and require rpc_password to
# be checked for RPC access via the rpc_ip and rpc_port. The user and password
# must be specified via HTTP's basic authentication method.
# As a client, supply this to the server via HTTP's basic authentication
@@ -368,7 +368,7 @@
#
# [rpc_password]
#
# As a server, require a this password to specified and require rpc_user to
# As a server, require this password to be specified and require rpc_user to
# be checked for RPC access via the rpc_ip and rpc_port. The user and password
# must be specified via HTTP's basic authentication method.
# As a client, supply this to the server via HTTP's basic authentication
@@ -393,8 +393,9 @@
# 0: Server certificates are not provided for RPC clients using SSL [default]
# 1: Client RPC connections wil be provided with SSL certificates.
#
# Note that if rpc_secure is enabled, it will also be necessasry to configure the
# certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and rpc_ssl_key
# Note that if rpc_secure is enabled, it will also be necessary to configure
# the certificate file settings located in rpc_ssl_cert, rpc_ssl_chain, and
# rpc_ssl_key
#
#
#
@@ -402,8 +403,9 @@
#
# <pathname>
#
# A file system path leading to the SSL certificate file to use for secure RPC.
# The file is in PEM format. The file is not needed if the chain includes it.
# A file system path leading to the SSL certificate file to use for secure
# RPC. The file is in PEM format. The file is not needed if the chain
# includes it.
#
#
#
@@ -444,7 +446,7 @@
#
# [sms_url]?from=[sms_from]&to=[sms_to]&api_key=[sms_key]&api_secret=[sms_secret]&text=['text']
#
# Where [...] are the corresponding valus from the configuration file, and
# Where [...] are the corresponding values from the configuration file, and
# ['test'] is the value of the JSON field with name 'text'.
#
# [sms_url]
@@ -538,8 +540,8 @@
# For domains, rippled will probe for https web servers at the specified
# domain in the following order: ripple.DOMAIN, www.DOMAIN, DOMAIN
#
# For public key entries, a comment may optionally be spcified after adding a
# space to the pulic key.
# For public key entries, a comment may optionally be specified after adding
# a space to the public key.
#
# Examples:
# ripple.com
@@ -587,14 +589,14 @@
#
# [path_search_fast]
# [path_search_max]
# When seaching for paths, the minimum and maximum search aggressiveness.
# When searching for paths, the minimum and maximum search aggressiveness.
#
# The default for 'path_search_fast' is 2. The default for 'path_search_max' is 10.
#
# [path_search_old]
#
# For clients that use the legacy path finding interfaces, the search
# agressiveness to use. The default is 7.
# agressivness to use. The default is 7.
#
#
#

View File

@@ -94,6 +94,20 @@ of the consensus process.
A signed statement of which transactions it believes should be included in
the next consensus ledger.
## Ledger Header ##
The "ledger header" is the chunk of data that hashes to the
ledger's hash. It contains the sequence number, parent hash,
hash of the previous ledger, hash of the root node of the
state tree, and so on.
## Ledger Base ##
The term "ledger base" refers to a particular type of query
and response used in the ledger fetch process that includes
the ledger header but may also contain other information
such as the root node of the state tree.
---
# References

View File

@@ -0,0 +1,187 @@
# SHAMap Introduction #
July 2014
The SHAMap is a Merkle tree (http://en.wikipedia.org/wiki/Merkle_tree).
The SHAMap is also a radix tree of radix 16
(http://en.wikipedia.org/wiki/Radix_tree).
*We need some kind of sensible summary of the SHAMap here.*
A given SHAMap always stores only one of three kinds of data:
* Transactions with metadata
* Transactions without metadata, or
* Account states.
So all of the leaf nodes of a particular SHAMap will always have a uniform
type. The inner nodes carry no data other than the hash of the nodes
beneath them.
## SHAMap Types ##
There are two different ways of building and using a SHAMap:
1. A mutable SHAMap and
2. An immutable SHAMap
The distinction here is not of the classic C++ immutable-means-unchanging
sense. An immutable SHAMap contains *nodes* that are immutable. Also,
once a node has been located in an immutable SHAMap, that node is
guaranteed to persist in that SHAMap for the lifetime of the SHAMap.
So, somewhat counter-intuitively, an immutable SHAMap may grow as new nodes
are introduced. But an immutable SHAMap will never get smaller (until it
entirely evaporates when it is destroyed). Nodes, once introduced to the
immutable SHAMap, also never change their location in memory. So nodes in
an immutable SHAMap can be handled using raw pointers (if you're careful).
One consequence of this design is that an immutable SHAMap can never be
"trimmed". There is no way to identify unnecessary nodes in an immutable
SHAMap that could be removed. Once a node has been brought into the
in-memory SHAMap, that node stays in memory for the life of the SHAMap.
Most SHAMaps are immutable, in the sense that they don't modify or remove
their contained nodes.
An example where a mutable SHAMap is required is when we want to apply
transactions to the last closed ledger. To do so we'd make a mutable
snapshot of the state tree and then start applying transactions to it.
Because the snapshot is mutable, changes to nodes in the snapshot will not
affect nodes in other SHAMAps.
An example using a immutable ledger would be when there's an open ledger
and some piece of code wishes to query the state of the ledger. In this
case we don't wish to change the state of the SHAMap, so we'd use an
immutable snapshot.
## SHAMap Creation ##
A SHAMap is usually not created from vacuum. Once an initial SHAMap is
constructed, later SHAMaps are usually created by calling
snapShot(bool isMutable) on the original SHAMap(). The returned SHAMap
has the expected characteristics (mutable or immutable) based on the passed
in flag.
It is cheaper to make an immutable snapshot of a SHAMap than to make a mutable
snapshot. If the SHAMap snapshot is mutable then any of the nodes that might
be modified must be copied before they are placed in the mutable map.
## SHAMap Thread Safety ##
SHAMaps can be thread safe, depending on how they are used. The SHAMap
uses a SyncUnorderedMap for its storage. The SyncUnorderedMap has three
thread-safe methods:
* size(),
* canonicalize(), and
* retrieve()
As long as the SHAMap uses only those three interfaces to its storage
(the mTNByID variable [which stands for Tree Node by ID]) the SHAMap is
thread safe.
## Walking a SHAMap ##
*We need a good description of why someone would walk a SHAMap and*
*how it works in the code*
## Late-arriving Nodes ##
As we noted earlier, SHAMaps (even immutable ones) may grow. If a SHAMap
is searching for a node and runs into an empty spot in the tree, then the
SHAMap looks to see if the node exists but has not yet been made part of
the map. This operation is performed in the `SHAMap::fetchNodeExternalNT()`
method. The *NT* is this case stands for 'No Throw'.
The `fetchNodeExternalNT()` method goes through three phases:
1. By calling `getCache()` we attempt to locate the missing node in the
TreeNodeCache. The TreeNodeCache is a cache of immutable
SHAMapTreeNodes that are shared across all SHAMaps.
Any SHAMapTreeNode that is immutable has a sequence number of zero.
When a mutable SHAMap is created then its SHAMapTreeNodes are given
non-zero sequence numbers. So the `assert (ret->getSeq() == 0)`
simply confirms that the TreeNodeCache indeed gave us an immutable node.
2. If the node is not in the TreeNodeCache, we attempt to locate the node
in the historic data stored by the data base. The call to
to `getApp().getNodeStore().fetch(hash)` does that work for us.
3. Finally, if mLedgerSeq is non-zero and we did't locate the node in the
historic data, then we call a MissingNodeHandler.
The non-zero mLedgerSeq indicates that the SHAMap is a complete map that
belongs to a historic ledger with the given (non-zero) sequence number.
So, if all expected data is always present, the MissingNodeHandler should
never be executed.
And, since we now know that this SHAMap does not fully represent
the data from that ledger, we set the SHAMap's sequence number to zero.
If phase 1 returned a node, then we already know that the node is immutable.
However, if either phase 2 executes successfully, then we need to turn the
returned node into an immutable node. That's handled by the call to
`make_shared<SHAMapTreeNode>` inside the try block. That code is inside
a try block because the `fetchNodeExternalNT` method promises not to throw.
In case the constructor called by `make_shared` throws we don't want to
break our promise.
## Canonicalize ##
The calls to `canonicalize()` make sure that if the resulting node is already
in the SHAMap, then we return the node that's already present -- we never
replace a pre-existing node. By using `canonicalize()` we manage a thread
race condition where two different threads might both recognize the lack of a
SHAMapTreeNode at the same time. If they both attempt to insert the node
then `canonicalize` makes sure that the first node in wins and the slower
thread receives back a pointer to the node inserted by the faster thread.
There's a problem with the current SHAMap design that `canonicalize()`
accommodates. Two different trees can have the exact same node (the same
hash value) with two different IDs. If the TreeNodeCache returns a node
with the same hash but a different ID, then we assume that the ID of the
passed-in node is 'better' than the older ID in the TreeNodeCache. So we
construct a new SHAMapTreeNode by copying the one we found in the
TreeNodeCache, but we give the new node the new ID. Then we replace the
SHAMapTreeNode in the TreeNodeCache with this newly constructed node.
The TreeNodeCache is not subject to the rule that any node must be
resident forever. So it's okay to replace the old node with the new node.
The `SHAMap::getCache()` method exhibits the same behavior.
## SHAMap Improvements ##
Here's a simple one: the SHAMapTreeNode::mAccessSeq member is currently not
used and could be removed.
Here's a more important change. The tree structure is currently embedded
in the SHAMapTreeNodes themselves. It doesn't have to be that way, and
that should be fixed.
When we navigate the tree (say, like `SHAMap::walkTo()`) we currently
ask each node for information that we could determine locally. We know
the depth because we know how many nodes we have traversed. We know the
ID that we need because that's how we're steering. So we don't need to
store the ID in the node. The next refactor should remove all calls to
`SHAMapTreeNode::GetID()`.
Then we can remove the NodeID member from SHAMapTreeNode.
Then we can change the SHAMap::mTNBtID member to be mTNByHash.
An additional possible refactor would be to have a base type, SHAMapTreeNode,
and derive from that InnerNode and LeafNode types. That would remove
some storage (the array of 16 hashes) from the LeafNodes. That refactor
would also have the effect of simplifying methods like `isLeaf()` and
`hasItem()`.

View File

@@ -1004,6 +1004,8 @@ SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT (const SHAMapNodeID& id, uin
{
SHAMapTreeNode::pointer ret;
// This if allows us to use the SHAMap in unit tests. So we don't attempt
// to fetch external nodes if we're not running in the application.
if (!getApp().running ())
return ret;