Commit Graph

26 Commits

Author SHA1 Message Date
Ed Hennis
86ef16dbeb Fix: Don't flag consensus as stalled prematurely (#5627)
Fix stalled consensus detection to prevent false positives in situations where there are no disputed transactions.

Stalled consensus detection was added to 2.5.0 in response to a network consensus halt that caused a round to run for over an hour. However, it has a flaw that makes it very easy to have false positives. Those false positives are usually mitigated by other checks that prevent them from having an effect, but there have been several instances of validators "running ahead" because there are circumstances where the other checks are "successful", allowing the stall state to be checked.
2025-08-08 17:13:32 -04:00
Ed Hennis
d22a5057b9 Prevent consensus from getting stuck in the establish phase (#5277)
- Detects if the consensus process is "stalled". If it is, then we can declare a 
  consensus and end successfully even if we do not have 80% agreement on
  our proposal.
  - "Stalled" is defined as:
    - We have a close time consensus
    - Each disputed transaction is individually stalled:
      - It has been in the final "stuck" 95% requirement for at least 2
        (avMIN_ROUNDS) "inner rounds" of phaseEstablish,
      - and either all of the other trusted proposers or this validator, if proposing,
        have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
        rounds", and at least 80% of the validators (including this one, if
        appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
  consensus establish phase's time, then consensus is considered "expired",
  and we will leave the round, which sends a partial validation (indicating
  that the node is moving on without validating). Two restrictions avoid
  prematurely exiting, or having an extended exit in extreme situations.
  - The 10x time is clamped to be within a range of 15s
    (ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
  - If consensus has not had an opportunity to walk through all avalanche
    states (defined as not going through 8 "inner rounds" of phaseEstablish),
    then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
  fallen behind, and move on, too, generally before hitting the timeout. Any
  validations or partial validations sent during this time will help the
  consensus process bring the nodes back together.
2025-03-20 12:41:44 -04:00
Ed Hennis
c17676a9be refactor: Improve ordering of headers with clang-format (#5343)
Removes all manual header groupings from source and header files by leveraging clang-format options.
2025-03-12 18:33:21 -04:00
Bart
2406b28e64 refactor: Remove unused and add missing includes (#5293)
The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms.
2025-03-11 14:16:45 -04:00
Pretty Printer
1d23148e6d Rewrite includes (#4997) 2024-06-20 13:57:16 -05:00
todaymoon
3f5e3212fe chore: remove repeat words (#5041) 2024-06-14 15:36:59 -04:00
Mark Travis
cea43099d2 Don't reach consensus as quickly if no other proposals seen: (#4763)
This fixes a case where a peer can desync under a certain timing
circumstance--if it reaches a certain point in consensus before it receives
proposals. 

This was noticed under high transaction volumes. Namely, when we arrive at the
point of deciding whether consensus is reached after minimum establish phase
duration but before having received any proposals. This could be caused by
finishing the previous round slightly faster and/or having some delay in
receiving proposals. Existing behavior arrives at consensus immediately after
the minimum establish duration with no proposals. This causes us to desync
because we then close a non-validated ledger. The change in this PR causes us to
wait for a configured threshold before making the decision to arrive at
consensus with no proposals. This allows validators to catch up and for brief
delays in receiving proposals to be absorbed. There should be no drawback since,
with no proposals coming in, we needn't be in a huge rush to jump ahead.
2024-03-22 16:22:29 -04:00
Sophia Xie
5aef102f4f Revert #4505, #4760 (#4842)
* Revert "Optimize calculation of close time to avoid impasse and minimize gratuitous proposal changes (#4760)"

This reverts commit 8ce85a9750.

* Revert "Several changes to improve Consensus stability: (#4505)"

This reverts commit f259cc1ab6.

* Add missing include

---------

Co-authored-by: seelabs <scott.determan@yahoo.com>
2023-11-30 21:00:50 -08:00
Mark Travis
f259cc1ab6 Several changes to improve Consensus stability: (#4505)
* Verify accepted ledger becomes validated, and retry
   with a new consensus transaction set if not.
 * Always store proposals.
 * Track proposals by ledger sequence. This helps slow peers catch
   up with the rest of the network.
 * Acquire transaction sets for proposals with future ledger sequences.
   This also helps slow peers catch up.
 * Optimize timer delay for establish phase to wait based on how
   long validators have been sending proposals. This also helps slow
   peers to catch up.
 * Fix impasse achieving close time consensus.
 * Don't wait between open and establish phases.
2023-09-11 15:48:32 -07:00
Elliot Lee
7ca1c644d1 Revert "Several changes to improve Consensus stability: (#4505)"
This reverts commit e8a7b2a1fc.
2023-08-29 16:52:31 -07:00
Mark Travis
e8a7b2a1fc Several changes to improve Consensus stability: (#4505)
* Verify accepted ledger becomes validated, and retry
   with a new consensus transaction set if not.
 * Always store proposals.
 * Track proposals by ledger sequence. This helps slow peers catch
   up with the rest of the network.
 * Acquire transaction sets for proposals with future ledger sequences.
   This also helps slow peers catch up.
 * Optimize timer delay for establish phase to wait based on how
   long validators have been sending proposals. This also helps slow
   peers to catch up.
 * Fix impasse achieving close time consensus.
 * Don't wait between open and establish phases.
2023-08-21 16:15:31 -07:00
Howard Hinnant
9932a19139 Reduce coupling to date.h by calling C++17 chrono functions 2021-03-17 15:02:15 -07:00
Pretty Printer
50760c6935 Format first-party source according to .clang-format 2020-04-23 10:02:04 -07:00
Scott Schurr
8cf7c9548a Remove conditionals for fix1528 enabled 14Nov2017 2020-04-09 11:42:34 -07:00
mtrippled
c78404e233 Pause for lagging validators. 2019-05-22 13:15:43 -07:00
Scott Schurr
0bbe6e226c Remove beast::Journal default constructor 2018-10-10 10:18:03 -04:00
Mark Travis
8eb8c77886 Performance logging and counters:
* Tally and duration counters for Job Queue tasks and RPC calls
    optionally rendered by server_info and server_state, and
    optionally printed to a distinct log file.
    - Tally each Job Queue task as it is queued, starts, and
      finishes running. Track total duration queued and running.
    - Tally each RPC call as it starts and either finishes
      successfully or throws an exception. Track total running
      duration for each.
  * Track currently executing Job Queue tasks and RPC methods
    along with durations.
  * Json-formatted performance log file written by a dedicated
    thread, for above-described data.
  * New optional parameter, "counters", for server_info and
    server_state. If set, render Job Queue and RPC call counters
    as well as currently executing tasks.
  * New configuration section, "[perf]", to optionally control
    performance logging to a file.
  * Support optional sub-second periods when rendering human-readable
    time points.
2018-04-08 02:24:38 -07:00
Mike Ellery
deb9e4ce3c Remove BeastConfig.h (RIPD-1167) 2018-04-08 01:52:12 -07:00
Brad Chase
94c6a2a850 Use LedgerTrie for preferred ledger (RIPD-1551):
These changes augment the Validations class with a LedgerTrie to better
track the history of support for validated ledgers. This improves the
selection of the preferred working ledger for consensus. The Validations
class now tracks both full and partial validations. Partial validations
are only used to determine the working ledger; full validations are
required for any quorum related function. Validators are also now
explicitly restricted to sending validations with increasing ledger
sequence number.
2018-02-02 20:38:38 -05:00
Brad Chase
2c13d9eb57 Redesign CSF framework (RIPD-1361):
- Separate `Scheduler` from `BasicNetwork`.
- Add an event/collector framework for monitoring invariants and calculating statistics.
- Allow distinct network and trust connections between Peers.
- Add a simple routing strategy to support broadcasting arbitrary messages.
- Add a common directed graph (`Digraph`) class for representing network and trust topologies.
- Add a `PeerGroup` class for simpler specification of the trust and network topologies.
- Add a `LedgerOracle` class to ensure distinct ledger histories and simplify branch checking.
- Add a `Submitter` to send transactions in at fixed or random intervals to fixed or random peers.

Co-authored-by: Joseph McGee
2017-12-01 14:15:04 -05:00
Brad Chase
c76656cf7f Use rounded close time in Consensus (RIPD-1528):
Switches the default behavior of Consensus to use roundCloseTime instead of
effCloseTime. effCloseTime is still used when accepting the consensus ledger to
ensure the consensus close time comes after the parent ledger close time. This
change eliminates an edge case in which peers could reach agreement on the close
time, but end up generating ledgers with different close times.
2017-09-22 19:35:29 -07:00
Brad Chase
df086301b6 Fix consensus quorum comparison 2017-07-20 14:14:03 -04:00
Brad Chase
3dfb4a13f1 Expose consensus parameters for simulation (RIPD-1355) 2017-07-11 12:53:53 -04:00
Brad Chase
00c60d408a Improve Consensus interface and documentation (RIPD-1340):
- Add Consensus::Result, which represents the result of the
establish state and includes the consensus transaction set, final
proposed position and disputes.
- Add Consensus::Mode to track how we are participating in
consensus and ensures the onAccept callback can distinguish when
we entered the round with consensus versus when we recovered from
a wrong ledger during a round.
- Rename Consensus::Phase to Consensus::State and eliminate the
processing phase.  Instead, accept is a terminal phase which
notifies RCLConsensus via onAccept callbacks.  Even if clients
dispatch accepting to another thread, all future calls except to
startRound will not change the state of consensus.
- Move validate_ status from Consensus to RCLConsensus, since
generic implementation does not directly reference whether a node
is validating or not.
- Eliminate gotTxSetInternal and handle externally received
TxSets distinct from locally generated positions.
- Change ConsensusProposal::changePosition to always update the
internal close time and position even if we have bowed out. This
enforces the invariant that our proposal's position always
matches our transaction set.
2017-04-24 13:13:23 -07:00
Brad Chase
2449f9c18d Fix handleLCL consensus bug:
Consensus::checkLCL can change state_ but it was being called inside
timerEntry after a switch on the current state_.  In rare cases, this might
end up calling stateEstablish even when the state_ was open.
2017-03-31 11:54:51 -07:00
Brad Chase
bc5a74057d Refactor consensus for simulation (RIPD-1011):
This is a substantial refactor of the consensus code and also introduces
a basic consensus simulation and testing framework.  The new generic/templated
version is in src/ripple/consensus and documents the current type requirements.
The version adapted for the RCL is in src/ripple/app/consensus.  The testing
framework is in src/test/csf.

Minor behavioral changes/fixes include:
* Adjust close time offset even when not validating.
* Remove spurious proposing_ = false call at end of handleLCL.
* Remove unused functionality provided by checkLastValidation.
* Separate open and converge time
* Don't send a bow out if we're not proposing
* Prevent consensus stopping if NetworkOPs switches to disconnect mode while
  consensus accepts a ledger
* Prevent a corner case in which Consensus::gotTxSet or Consensus::peerProposal
  has the potential to update internal state while an dispatched accept job is
  running.
* Distinguish external and internal calls to startNewRound.  Only external
  calls can reset the proposing_ state of consensus
2017-03-21 18:54:57 -04:00