This change fixes the suite names all around the test files, to make them match to the folder name in which this test files are located. Also, the RCL test files are relocated to the consensus folder, because they are testing consensus functionality.
Fix stalled consensus detection to prevent false positives in situations where there are no disputed transactions.
Stalled consensus detection was added to 2.5.0 in response to a network consensus halt that caused a round to run for over an hour. However, it has a flaw that makes it very easy to have false positives. Those false positives are usually mitigated by other checks that prevent them from having an effect, but there have been several instances of validators "running ahead" because there are circumstances where the other checks are "successful", allowing the stall state to be checked.
- Detects if the consensus process is "stalled". If it is, then we can declare a
consensus and end successfully even if we do not have 80% agreement on
our proposal.
- "Stalled" is defined as:
- We have a close time consensus
- Each disputed transaction is individually stalled:
- It has been in the final "stuck" 95% requirement for at least 2
(avMIN_ROUNDS) "inner rounds" of phaseEstablish,
- and either all of the other trusted proposers or this validator, if proposing,
have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
rounds", and at least 80% of the validators (including this one, if
appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
consensus establish phase's time, then consensus is considered "expired",
and we will leave the round, which sends a partial validation (indicating
that the node is moving on without validating). Two restrictions avoid
prematurely exiting, or having an extended exit in extreme situations.
- The 10x time is clamped to be within a range of 15s
(ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
- If consensus has not had an opportunity to walk through all avalanche
states (defined as not going through 8 "inner rounds" of phaseEstablish),
then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
fallen behind, and move on, too, generally before hitting the timeout. Any
validations or partial validations sent during this time will help the
consensus process bring the nodes back together.
The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms.
This fixes a case where a peer can desync under a certain timing
circumstance--if it reaches a certain point in consensus before it receives
proposals.
This was noticed under high transaction volumes. Namely, when we arrive at the
point of deciding whether consensus is reached after minimum establish phase
duration but before having received any proposals. This could be caused by
finishing the previous round slightly faster and/or having some delay in
receiving proposals. Existing behavior arrives at consensus immediately after
the minimum establish duration with no proposals. This causes us to desync
because we then close a non-validated ledger. The change in this PR causes us to
wait for a configured threshold before making the decision to arrive at
consensus with no proposals. This allows validators to catch up and for brief
delays in receiving proposals to be absorbed. There should be no drawback since,
with no proposals coming in, we needn't be in a huge rush to jump ahead.
* It is now an invariant that all constructed Public Keys are valid,
non-empty and contain 33 bytes of data.
* Additionally, the memory footprint of the PublicKey class is reduced.
The size_ data member is declared as static.
* Distinguish and identify the PublisherList retrieved from the local
config file, versus the ones obtained from other validators.
* Fixes#2942
* Revert "Optimize calculation of close time to avoid impasse and minimize gratuitous proposal changes (#4760)"
This reverts commit 8ce85a9750.
* Revert "Several changes to improve Consensus stability: (#4505)"
This reverts commit f259cc1ab6.
* Add missing include
---------
Co-authored-by: seelabs <scott.determan@yahoo.com>
* Verify accepted ledger becomes validated, and retry
with a new consensus transaction set if not.
* Always store proposals.
* Track proposals by ledger sequence. This helps slow peers catch
up with the rest of the network.
* Acquire transaction sets for proposals with future ledger sequences.
This also helps slow peers catch up.
* Optimize timer delay for establish phase to wait based on how
long validators have been sending proposals. This also helps slow
peers to catch up.
* Fix impasse achieving close time consensus.
* Don't wait between open and establish phases.
* Verify accepted ledger becomes validated, and retry
with a new consensus transaction set if not.
* Always store proposals.
* Track proposals by ledger sequence. This helps slow peers catch
up with the rest of the network.
* Acquire transaction sets for proposals with future ledger sequences.
This also helps slow peers catch up.
* Optimize timer delay for establish phase to wait based on how
long validators have been sending proposals. This also helps slow
peers to catch up.
* Fix impasse achieving close time consensus.
* Don't wait between open and establish phases.
The `SHAMapItem` class contains a variable-sized buffer that
holds the serialized data associated with a particular item
inside a `SHAMap`.
Prior to this commit, the buffer for the serialized data was
allocated separately. Coupled with the fact that most instances
of `SHAMapItem` were wrapped around a `std::shared_ptr` meant
that an instantiation might result in up to three separate
memory allocations.
This commit switches away from `std::shared_ptr` for `SHAMapItem`
and uses `boost::intrusive_ptr` instead, allowing the reference
count for an instance to live inside the instance itself. Coupled
with using a slab-based allocator to optimize memory allocation
for the most commonly sized buffers, the net result is significant
memory savings. In testing, the reduction in memory usage hovers
between 400MB and 650MB. Other scenarios might result in larger
savings.
In performance testing with NFTs, this commit reduces memory size by
about 15% sustained over long duration.
Commit 2 of 3 in #4218.
This commit implements partitioned unordered maps and makes it possible
to traverse such a map in parallel, allowing for more efficient use of
CPU resources.
The `CachedSLEs`, `TaggedCache`, and `KeyCache` classes make use of the
new functionality, which should improve performance.
This commit expands the detection capabilities of the Byzantine
validation detector. Prior to this commit, only validators that
were on a server's UNL were monitored. Now, all the validations
that a server receives are passed through the detector.
* Creates a version 2 of the UNL file format allowing publishers to
pre-publish the next UNL while the current one is still valid.
* Version 1 of the UNL file format is still valid and backward
compatible.
* Also causes rippled to lock down if it has no valid UNLs, similar to
being amendment blocked, except reversible.
* Resolves#3548
* Resolves#3470
This change can help improve the liveness of the network during periods of network
instability, by allowing the network to track which validators are presently not online
and to disregard them for the purposes of quorum calculations.
This commit introduces the "HardenedValidations" amendment which,
if enabled, allows validators to include additional information in
their validations that can increase the robustness of consensus.
Specifically, the commit introduces a new optional field that can
be set in validation messages can be used to attest to the hash of
the latest ledger that a validator considers to be fully validated.
Additionally, the commit leverages the previously introduced "cookie"
field to improve the robustness of the network by making it possible
for servers to automatically detect accidental misconfiguration which
results in two or more validators using the same validation key.
This changeset ensures the preferred ledger calculation
properly distinguishes the absence of trusted validations
from a preferred ledger which is the genesis ledger.
Fixes: RIPD-1575. Fix argument passing to runner. Allow multiple unit
test selectors to be passed via --unittest argument. Add optional
integer priority value to test suite list. Fix several failing manual
tests. Update CLI usage message to make it clearer.
* Tally and duration counters for Job Queue tasks and RPC calls
optionally rendered by server_info and server_state, and
optionally printed to a distinct log file.
- Tally each Job Queue task as it is queued, starts, and
finishes running. Track total duration queued and running.
- Tally each RPC call as it starts and either finishes
successfully or throws an exception. Track total running
duration for each.
* Track currently executing Job Queue tasks and RPC methods
along with durations.
* Json-formatted performance log file written by a dedicated
thread, for above-described data.
* New optional parameter, "counters", for server_info and
server_state. If set, render Job Queue and RPC call counters
as well as currently executing tasks.
* New configuration section, "[perf]", to optionally control
performance logging to a file.
* Support optional sub-second periods when rendering human-readable
time points.
Each validator will generate a random cookie on startup that it will
include in each of its validations. This will allow validators to detect
when more than one validator is accidentally operating with the same
validation keys.
These changes use the hash of the consensus transaction set when
characterizing the mismatch between a locally built ledger and fully
validated network ledger. This allows detection of non-determinism in
transaction process, in which consensus succeeded, but a node somehow
generated a different subsequent ledger.