- Detects if the consensus process is "stalled". If it is, then we can declare a
consensus and end successfully even if we do not have 80% agreement on
our proposal.
- "Stalled" is defined as:
- We have a close time consensus
- Each disputed transaction is individually stalled:
- It has been in the final "stuck" 95% requirement for at least 2
(avMIN_ROUNDS) "inner rounds" of phaseEstablish,
- and either all of the other trusted proposers or this validator, if proposing,
have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
rounds", and at least 80% of the validators (including this one, if
appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
consensus establish phase's time, then consensus is considered "expired",
and we will leave the round, which sends a partial validation (indicating
that the node is moving on without validating). Two restrictions avoid
prematurely exiting, or having an extended exit in extreme situations.
- The 10x time is clamped to be within a range of 15s
(ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
- If consensus has not had an opportunity to walk through all avalanche
states (defined as not going through 8 "inner rounds" of phaseEstablish),
then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
fallen behind, and move on, too, generally before hitting the timeout. Any
validations or partial validations sent during this time will help the
consensus process bring the nodes back together.
The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms.
Combine multiple related debug log data points into a single
message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn
across multiple lines, making correlation difficult.
The Heartbeat Timer and consensus ledger accept processing
each have this capability.
Also guarantees that log entries will be written if the
node is a validator, regardless of log severity level.
Otherwise, the level of these messages is at INFO severity.
* Revert "Optimize calculation of close time to avoid impasse and minimize gratuitous proposal changes (#4760)"
This reverts commit 8ce85a9750.
* Revert "Several changes to improve Consensus stability: (#4505)"
This reverts commit f259cc1ab6.
* Add missing include
---------
Co-authored-by: seelabs <scott.determan@yahoo.com>
* Verify accepted ledger becomes validated, and retry
with a new consensus transaction set if not.
* Always store proposals.
* Track proposals by ledger sequence. This helps slow peers catch
up with the rest of the network.
* Acquire transaction sets for proposals with future ledger sequences.
This also helps slow peers catch up.
* Optimize timer delay for establish phase to wait based on how
long validators have been sending proposals. This also helps slow
peers to catch up.
* Fix impasse achieving close time consensus.
* Don't wait between open and establish phases.
* Verify accepted ledger becomes validated, and retry
with a new consensus transaction set if not.
* Always store proposals.
* Track proposals by ledger sequence. This helps slow peers catch
up with the rest of the network.
* Acquire transaction sets for proposals with future ledger sequences.
This also helps slow peers catch up.
* Optimize timer delay for establish phase to wait based on how
long validators have been sending proposals. This also helps slow
peers to catch up.
* Fix impasse achieving close time consensus.
* Don't wait between open and establish phases.
The `SHAMapItem` class contains a variable-sized buffer that
holds the serialized data associated with a particular item
inside a `SHAMap`.
Prior to this commit, the buffer for the serialized data was
allocated separately. Coupled with the fact that most instances
of `SHAMapItem` were wrapped around a `std::shared_ptr` meant
that an instantiation might result in up to three separate
memory allocations.
This commit switches away from `std::shared_ptr` for `SHAMapItem`
and uses `boost::intrusive_ptr` instead, allowing the reference
count for an instance to live inside the instance itself. Coupled
with using a slab-based allocator to optimize memory allocation
for the most commonly sized buffers, the net result is significant
memory savings. In testing, the reduction in memory usage hovers
between 400MB and 650MB. Other scenarios might result in larger
savings.
In performance testing with NFTs, this commit reduces memory size by
about 15% sustained over long duration.
Commit 2 of 3 in #4218.
- MSVC 19.x reported a warning about import paths in boost for
function_output_iterator class (boost::function_output_iterator).
- Eliminate that warning by updating the import paths, as suggested by
the compiler warnings.
This commit implements partitioned unordered maps and makes it possible
to traverse such a map in parallel, allowing for more efficient use of
CPU resources.
The `CachedSLEs`, `TaggedCache`, and `KeyCache` classes make use of the
new functionality, which should improve performance.
A recent version of clang notes a number of places in range
for loops where the code base was making unnecessary copies
or using const lvalue references to extend lifetimes. This
fixes the places that clang identified.
This commit expands the detection capabilities of the Byzantine
validation detector. Prior to this commit, only validators that
were on a server's UNL were monitored. Now, all the validations
that a server receives are passed through the detector.
When evaluating the fitness and usefulness of an outbound peer, the code
would incorrectly calculate the amount of time that the peer spent in
a non-useful state.
This commit, if merged, corrects the calculation and makes the timeout
values configurable by server operators.
Two new options are introduced in the 'overlay' stanza of the config
file. The default values, in seconds, are:
[overlay]
max_unknown_time = 600
max_diverged_time = 300
This commit introduces the "HardenedValidations" amendment which,
if enabled, allows validators to include additional information in
their validations that can increase the robustness of consensus.
Specifically, the commit introduces a new optional field that can
be set in validation messages can be used to attest to the hash of
the latest ledger that a validator considers to be fully validated.
Additionally, the commit leverages the previously introduced "cookie"
field to improve the robustness of the network by making it possible
for servers to automatically detect accidental misconfiguration which
results in two or more validators using the same validation key.
Each validator will generate a random cookie on startup that it will
include in each of its validations. This will allow validators to detect
when more than one validator is accidentally operating with the same
validation keys.
These changes use the hash of the consensus transaction set when
characterizing the mismatch between a locally built ledger and fully
validated network ledger. This allows detection of non-determinism in
transaction process, in which consensus succeeded, but a node somehow
generated a different subsequent ledger.
These changes augment the Validations class with a LedgerTrie to better
track the history of support for validated ledgers. This improves the
selection of the preferred working ledger for consensus. The Validations
class now tracks both full and partial validations. Partial validations
are only used to determine the working ledger; full validations are
required for any quorum related function. Validators are also now
explicitly restricted to sending validations with increasing ledger
sequence number.
- Separate `Scheduler` from `BasicNetwork`.
- Add an event/collector framework for monitoring invariants and calculating statistics.
- Allow distinct network and trust connections between Peers.
- Add a simple routing strategy to support broadcasting arbitrary messages.
- Add a common directed graph (`Digraph`) class for representing network and trust topologies.
- Add a `PeerGroup` class for simpler specification of the trust and network topologies.
- Add a `LedgerOracle` class to ensure distinct ledger histories and simplify branch checking.
- Add a `Submitter` to send transactions in at fixed or random intervals to fixed or random peers.
Co-authored-by: Joseph McGee
Switches the default behavior of Consensus to use roundCloseTime instead of
effCloseTime. effCloseTime is still used when accepting the consensus ledger to
ensure the consensus close time comes after the parent ledger close time. This
change eliminates an edge case in which peers could reach agreement on the close
time, but end up generating ledgers with different close times.
- Add Consensus::Result, which represents the result of the
establish state and includes the consensus transaction set, final
proposed position and disputes.
- Add Consensus::Mode to track how we are participating in
consensus and ensures the onAccept callback can distinguish when
we entered the round with consensus versus when we recovered from
a wrong ledger during a round.
- Rename Consensus::Phase to Consensus::State and eliminate the
processing phase. Instead, accept is a terminal phase which
notifies RCLConsensus via onAccept callbacks. Even if clients
dispatch accepting to another thread, all future calls except to
startRound will not change the state of consensus.
- Move validate_ status from Consensus to RCLConsensus, since
generic implementation does not directly reference whether a node
is validating or not.
- Eliminate gotTxSetInternal and handle externally received
TxSets distinct from locally generated positions.
- Change ConsensusProposal::changePosition to always update the
internal close time and position even if we have bowed out. This
enforces the invariant that our proposal's position always
matches our transaction set.
This is a substantial refactor of the consensus code and also introduces
a basic consensus simulation and testing framework. The new generic/templated
version is in src/ripple/consensus and documents the current type requirements.
The version adapted for the RCL is in src/ripple/app/consensus. The testing
framework is in src/test/csf.
Minor behavioral changes/fixes include:
* Adjust close time offset even when not validating.
* Remove spurious proposing_ = false call at end of handleLCL.
* Remove unused functionality provided by checkLastValidation.
* Separate open and converge time
* Don't send a bow out if we're not proposing
* Prevent consensus stopping if NetworkOPs switches to disconnect mode while
consensus accepts a ledger
* Prevent a corner case in which Consensus::gotTxSet or Consensus::peerProposal
has the potential to update internal state while an dispatched accept job is
running.
* Distinguish external and internal calls to startNewRound. Only external
calls can reset the proposing_ state of consensus