25 KiB
Overlay Peering
P2P network using persistent TCP/IP connections. Messages serialized via Protocol Buffers. OverlayImpl manages connections; PeerImp handles per-peer logic. PeerFinder (sub-module under peerfinder/) handles peer discovery, slot accounting, and address caches.
Key Invariants
- Connection preference order: Fixed Peers → Livecache → Bootcache
- Cluster connections and reserved (
PeerReservationTable) connections do NOT count toward slot limits inCounts::can_activate— they bypassm_in_active/m_out_activecaps - Validators are forced
peerPrivate=truebyConfig::makeConfigeven without explicit[peer_private]; this is "soft" privacy (still accepts inbound, but asks peers not to gossip address).wantIncomingis derived before the validator key check fires, so a validator with a key still advertises inbound willingness internally. - Protobuf message changes MUST maintain wire compatibility or risk network partitioning
- Squelching: after
MAX_SELECTED_PEERS=5peers each crossMAX_MESSAGE_THRESHOLD=20messages, a random 5-peer subset becomes "Selected"; rest are muted viaTMSquelchfor a randomized window in[MIN_UNSQUELCH_EXPIRE=300s, MAX_UNSQUELCH_EXPIRE_PEERS=3600s] - Reduce-relay does not activate for
WAIT_ON_BOOTUP=10minafter process start (Slots::reduceRelayReady) - Handshake binds TLS session to node identity via signature of
makeSharedValue(SHA-512 XOR of TLS finished messages, thensha512Half); a zero shared value (degenerate XOR) is rejected - Wire format: 6-byte header uncompressed, 10-byte compressed; 26-bit payload size field caps messages at
maximumMessageSize = 64 MiB - Hop count cap:
Endpointconstructor clampshopstomaxHops+1=7;Logic::preprocessdropshops > maxHops=6and increments surviving hops by 1 before storage - TX reduce-relay queue is bounded by
MAX_TX_QUEUE_SIZE=10000hashes per peer; required to stay under the 64 MiB protocol limit at high TPS peersWithMessage_(inSlots) isinline static— shared across all instantiations, not per-instanceBootcachevalence is a streak counter: clamped to 0 before crossing sign, so a failing peer resets to 0 before going positive. Static peers receivestaticValence=32.Livecacheusespush_frontinsertion — MUSTshuffle()before handout to prevent topology manipulation by an adversary repeatedly advertising its own addressSourceStrings::fetch()silently drops malformed addresses — no error returned for bad config entriesCheckerdestructor callswait()only; must callstop()first explicitly before destruction to cancel pending probes
Common Bug Patterns
- PeerFinder slot exhaustion: if
inPeers/outPeersis reached, new connections silently fail; checkCounts::can_activateandattempts_needed HashRouter::shouldRelayprevents duplicate relay; bypassing it causes message storms (OverlayImpl::relayenforces this)ConnectAttempt::processResponseon HTTP 503 parsespeer-ipsJSON array for redirect; malformed entries are validated as endpoints before being passed topeerFinder().onRedirectsPeerImp::closemust run on the strand; calling from wrong thread causes race conditions on socket and timer state- Destructor chain:
~PeerImp→deletePeer→onPeerDeactivate→on_closed→remove; interrupting this chain leaks slots ~ConnectAttemptreleases the PeerFinder slot viaon_closed(slot_)only ifslot_ != nullptr; on successful promotion toPeerImp,slot_is moved out and must be left nulltryAsyncShutdown()must defer SSL shutdown until!readPending_ && !writePending_; callingasync_shutdownwhile async I/O is in flight is undefined behaviordynamic_pointer_cast<SlotImp>is required whereverManagerAPI takesshared_ptr<Slot>butLogicneedsSlotImp- A compressed message from a peer that did NOT negotiate compression is a hard
protocol_errorininvokeProtocolMessage(prevents CPU forcing attack) - Self-squelch attempt (peer sends
TMSquelchfor our own validation key) is silently dropped inPeerImp::onMessage(TMSquelch)— never trust a peer to silence us Cluster::for_eachcallback must NOT callCluster::update— same non-recursive mutex, deadlockZeroCopyOutputStreamdestructor MUST flush trailingcommit_— protobuf doesn't guarantee terminalBackUporNextcall; missing flush silently drops bytesBootcacheerase-then-reinsert pattern inon_success/on_failure: bimap values are logically const after insert, so valence updates require erase + reinsertSlotImp::state(active)is forbidden — must useactivate()which also setswhenAcceptEndpoints; bypassing this leaves the flood-control timestamp unsetSourceStrings::fetch()has an idempotent retry loop quirk: iffrom_string()fails, it retries the same string (no-op); effective behavior is just "drop invalid entries"
Connection Lifecycle
Outbound (ConnectAttempt)
OverlayImpl::connect→ resource check →peerFinder().new_outbound_slot()→ createConnectAttempt- Five-phase chain:
async_connect→ TLSasync_handshake→ HTTP write → HTTP read →processResponse - Dual-timer scheme: global 25s ceiling (
connectTimeout) + per-step timers (8/8/3/3/2s); both shareonTimercallback distinguishing by expiry comparison. Global timer armed once (guarded by epoch-check), step timer reset at each phase. ioPending_flag prevents starting SSL shutdown while another async op is pending on the stream- On HTTP 101:
verifyHandshake→ createPeerImp→ moveslot_andstream_ptr_into peer →overlay_.add_active(peer) - On HTTP 503 with JSON
peer-ips: forward topeerFinder().onRedirects verify_noneon TLS — security comes from node-key signature overmakeSharedValue, not cert chain
Inbound (OverlayImpl::onHandoff)
- HTTP server hands off TLS stream + upgrade request
- Sequential gates:
processRequest(for/crawl,/health,/vl/) → resource limit →new_inbound_slot→negotiateProtocolVersion→makeSharedValue→verifyHandshake - Create
PeerImp, insert intom_peers(slot-keyed);peer->run()MUST be called while holdingmutex_(race vsstop()draining list) m_peerspopulated here, butids_only afteractivate()post-protocol-handshake
Two-Phase Peer Registration
m_peers:PeerFinder::Slot → weak_ptr<PeerImp>— populated at handshake start, used for slot managementids_:Peer::id_t → weak_ptr<PeerImp>— populated atactivate()after protocol handshake; used for broadcast and relay- Outbound peers (via
ConnectAttempt) populate both maps together inadd_active
Review Checklist
- Verify resource manager checks on both inbound and outbound connections
- New protocol messages: update protobuf definitions AND verify wire compatibility; add LZ4 eligibility list in
Message::compress()if bulk - Squelch changes: test with high peer counts; incorrect squelch logic can silence validators
- Header parsing changes (
ProtocolMessage.h): the high-bit format guard (*iter & 0x80) and reserved-bit checks (*iter & 0x0C == 0) MUST remain - Adding a new compression
Algorithmenum value: must have high bit set, low nibble zero (so it's extractable via*iter & 0xF0); updateCompression.hdispatch switches or theUNREACHABLEguard fires - Strand discipline: any new method touching socket/queue state must guard with
if (!strand_.running_in_this_thread()) return post(strand_, ...) ZeroCopyOutputStreamuse: always ensure the object goes out of scope (destructor flush) before the caller reads from the streambuf- Bootcache changes: remember valence updates require erase-then-reinsert (bimap), and the cooldown (60s) batches writes
Key Patterns
Strand Execution
// REQUIRED: socket operations must run on the strand
if (!strand_.running_in_this_thread())
return post(strand_, std::bind(
&PeerImp::close, shared_from_this()));
// Calling socket ops from wrong thread causes races on state
Duplicate Relay Prevention
// REQUIRED: check HashRouter before relaying
if (!hashRouter_.shouldRelay(hash))
return; // already relayed — suppress duplicate
overlay_.relay(message, hash);
Shared Lazy Compression
// Message::getBuffer(Compressed::On) — compresses once, shared across N peers
std::call_once(once_flag_, &Message::compress, this);
// Eligible types only (mtTRANSACTION, mtLEDGER_DATA, mtVALIDATOR_LIST, ...);
// Latency-sensitive types (mtPING, mtVALIDATION, mtPROPOSE_LEDGER) excluded.
// Falls back to uncompressed if savings < 4 bytes (compressed header overhead).
// Messages <= 70 bytes are never compressed.
Resource Charging Batches
// PeerImp::onMessageBegin resets fee_; onMessageEnd applies charge once per
// message via charge(). Handlers escalate via fee_.update() (monotonic).
Exception-Based Handshake Failures
// verifyHandshake() throws std::runtime_error on any check failure;
// callers (ConnectAttempt, PeerImp::doAccept) wrap in try/catch and tear down.
Traffic Categorization Double-Call
// categorize() called once at Message construction (outbound, inbound=false).
// addCount() called twice per message: once for category, once for 'total'.
// 'unknown' is NOT rolled into 'total'.
Reduce-Relay (Squelch) Architecture
Two halves, decoupled:
- Upstream (
Slot/SlotsinOverlayImpl): counts inbound validator messages per peer, selects 5 sources, callsSquelchHandler::squelch()(implemented byOverlayImpl) which sendsTMSquelchover the wire. UsesUptimeClock. - Downstream (
SquelchinPeerImp): receivesTMSquelch, stores expiry inhash_map<PublicKey, time_point>.PeerImp::send()callsexpireSquelch(validator)before transmitting any validator-keyed message;falsereturn → drop, count underTrafficCount::squelch_suppressed.
All OverlayImpl::updateSlotAndSquelch calls are dispatched to strand_ because Slots<UptimeClock> is not thread-safe.
Squelch expiry is lazy: no background timer. expireSquelch removes stale entries on next send. Out-of-bounds durations in incoming TMSquelch trigger feeInvalidData and removeSquelch (defensive clear).
Slot Selection Algorithm (Slot<clock_type>::update)
Two-threshold design: peers enter the considered pool at MIN_MESSAGE_THRESHOLD=19 messages; selection fires when MAX_SELECTED_PEERS=5 peers individually reach MAX_MESSAGE_THRESHOLD=20. The one-message gap lets the system confirm a peer has continued sending before committing it as a candidate. If fewer than 5 non-idle peers are available at selection time, initCounting() resets and defers — never squelches with incomplete picture.
Inactivity (IDLED=8s): idle selected peer → unsquelch all + revert to Counting. Slots whose lastSelected_ is older than MAX_UNSQUELCH_EXPIRE_DEFAULT=600s are deleted by deleteIdlePeers().
Squelch duration scaled by peer count: min(max(600s, 10s × npeers), 3600s).
TX Reduce-Relay
When txReduceRelayEnabled_ (negotiated via FEATURE_TXRR):
- Full transactions go to a quota of peers (computed from
TX_REDUCE_RELAY_MIN_PEERSandTX_RELAY_PERCENTAGE) - Remaining peers get hash announcements via
addTxQueue→ batchedTMHaveTransactionsflushed by periodicsendTxQueue - Peers without the feature always get full message (back-compat)
- Peer list is shuffled with
default_prng()to avoid systematic bias MAX_TX_QUEUE_SIZE=10000cap;doTransactionsrejects requests exceeding this as malformed
Tracking State
tracking_ (atomic Tracking enum): unknown, converged, diverged. Thresholds from Tuning.h:
convergedLedgerLimit=24— within this many ledgers of validated indexdivergedLedgerLimit=128— beyond this, mark diverged and start theMAX_DIVERGED_TIMEcountdown
Hysteresis (24 vs 128) prevents oscillation on slightly-behind peers.
Send Queue Backpressure (Tuning.h)
Three tiers:
targetSendQueue=128— below this, peer is healthy; resetslarge_sendq_countersendqIntervals=4— consecutive 1-second ticks at-or-above target before disconnectdropSendQueue=192— refuse new query responses (don't do expensive lookups for stuck peer)sendQueueLogFreq=64— log every 64th enqueue when queue is large (throttle log spam)
Other key tuning constants:
softMaxReplyNodes=8192/hardMaxReplyNodes=12288— soft/hard caps forTMLedgerDatanode countsmaxQueryDepth=3— recursion limit forTMGetLedger; deeper queries rejected asbadDatacheckIdlePeers=4— modulo for timer-driven idle peer scanreadBufferBytes=16384—constexpr size_tfor socket read buffer (separate from enum for type reasons)
PeerFinder Sub-Module
Implements peer address discovery, slot accounting, and reachability checks. Owned by OverlayImpl via make_Manager(). Hidden behind Manager abstract interface; concrete ManagerImp lives in detail/PeerfinderManager.cpp.
Components
Logic<Checker>: central decision engine. Holdsslots_,connectedAddresses_(multiset for IP limit),keys_(dedup public keys),fixed_,livecache_,bootcache_. Guarded bystd::recursive_mutex lock_. Recursive mutex needed becauseon_closed()callsremove()independently.Livecache: ~30s TTL gossip cache (Tuning::liveCacheSecondsToLive).beast::aged_map+boost::intrusive::listper hop bucket (sizemaxHops+2=9, indices 0–8). MUSTshuffle()before handout —push_frontinsertion is exploitable otherwise.Bootcache: persistent (SQLite viaStoreSqdb). Bimap (unordered_set_ofby endpoint,multiset_ofby valence) for O(1) update and ranked iteration. Valence is a streak counter (clamped to 0 before crossing sign).staticValence=32for[ips]/[ips_fixed]. Throttled writes: 60s cooldown viaflagForUpdate/checkUpdate; destructor force-flushes. Pruning: remove bottom 10% when over 1000 entries.Checker<Protocol>: async TCP probe for verifying peer's advertised listening port. Self-managingasync_opviashared_ptrcapture in handler;~Checkercallswait()(notstop()— must callstop()first explicitly).Counts: pure bookkeeping, no own mutex (relies onLogic::lock_). All updates funnel through privateadjust(slot, ±1). Fixed/reserved bypass active caps incan_activate.isConnectedToNetwork()returnstrueonly whenm_out_max == 0(pure listener mode).SlotImp: concrete slot state. Two constructors: inbound takes both endpoints, setschecked=false,canAccept=false; outbound only takes remote, setschecked=true,canAccept=truesince TCP connect itself proves reachability. State machine enforced byXRPL_ASSERTinstate()andactivate().m_listening_portisstd::atomic<int32_t>with-1sentinel.Fixed: per-fixed-peer backoff. Fibonacci sequence in minutes:{1,1,2,3,5,8,13,21,34,55}, clamped to last index.failure()advances;success()resets.Source: abstract; only concrete isSourceStrings(config[ips]).cancel()is a no-op for all current (synchronous) implementations but exists as an extension point for future async sources.
Autoconnect Tier Order
Logic::autoconnect() strictly returns at first non-empty tier:
- Fixed peers (via
get_fixed, respectingFixed::when()backoff) - Livecache (shuffled, reverse hop order — far peers first for topological diversity)
- (Bootcache refill placeholder for DNS)
- Bootcache fallback
m_squelches aged set (60s TTL, Tuning::recentAttemptDuration) suppresses rapid retries to same address across calls.
Endpoint Gossip
- Receiving (
on_endpoints+preprocess): rate-limited via per-slotwhenAcceptEndpoints(Tuning::secondsPerMessage=151s, a prime to desync nodes). Random sample-down if oversized.hops==0entry's IP replaced with sender's socket address (peer doesn't know own public IP). All surviving hops incremented by 1 before livecache insert. First-hop entries triggerChecker::async_connectfor reachability test. - Sending (
buildEndpointsForPeers): shuffle slots, useSlotHandoutsper peer, runhandout()algorithm. Self-advertisement uses zero-address IPv6 sentinel — receiver substitutes socket's remote address. Handoutsalgorithm: round-robin across multiple targets to ensure fair distribution.move_backafter each acceptance rotates endpoints. Per-target dedup viaSlotImp::recent_t(aged map;filter()uses<=hop comparison;try_insertwrites both received and sent into recent — pessimistic update).
recent_t Filter Semantics
insert() updates cached hop count only if new value ≤ existing. filter() suppresses sends when cached hop ≤ sending hop. The <= boundary is intentional — sending at a strictly lower hop than the peer knows is still useful; matching or higher is redundant.
TLS Channel-Binding (Non-Standard)
makeSharedValue derives a 256-bit value from TLS finished messages:
sha512Half(SHA512(my_finished) XOR SHA512(peer_finished))
Rejects degenerate zero-XOR case. Non-standard (see OpenSSL #5509, XRPLF/rippled #2413). TLS cert verification is explicitly disabled (verify_none) — security comes from binding node-public-key signature to this shared value via Session-Signature HTTP header. MITM produces different finished values → signature mismatch → rejection.
Handshake HTTP Headers
Built by buildHandshake, verified by verifyHandshake. Verify order is layered (cheap → expensive):
Network-IDmismatchNetwork-Time±20s tolerancePublic-Keyparse, self-connection checkSession-Signaturecryptographic verifyLocal-IP/Remote-IPcross-check (NAT diagnostics)
Feature negotiation via X-Protocol-Ctl: compr=lz4, vprr=1, txrr=1, ledgerreplay=1. Responder echoes back only features locally configured AND requested (AND-gate). Initiator unconditionally advertises all locally supported features.
ZeroCopy I/O Adapters
ZeroCopyInputStream<Buffers> wraps ConstBufferSequence for protobuf parsing without intermediate copy. BackUp/Skip support sub-buffer granularity via tracked pos_ within current buffer. Empty buffer sequence is safe (null pos_ initialized in constructor).
ZeroCopyOutputStream<Streambuf> uses deferred commit pattern: commit_ tracks bytes promised but not yet committed. Destructor MUST flush trailing commit_ — protobuf doesn't guarantee terminal BackUp or Next call. BackUp(n) asserts n <= commit_ and prevents double-commit.
Traffic Categorization
TrafficCount::categorize() is called once at Message construction (outbound) and per inbound message. Two-stage: static unordered_map<MessageType, category> for simple types, then dynamic_cast for protobuf inspection of TMLedgerData/TMGetLedger (requestcookie distinguishes forwarded vs originated) and TMGetObjectByHash (query() flag determines get/share). unknown is NOT rolled into total. squelch_suppressed records bytes NOT transmitted due to squelch; squelch_ignored records bytes from peers ignoring squelch.
Compression Eligibility (Message::compress)
Skip if ≤70 bytes. Whitelist of eligible types: mtMANIFESTS, mtENDPOINTS, mtTRANSACTION, mtGET_LEDGER, mtLEDGER_DATA, mtGET_OBJECTS, mtVALIDATOR_LIST, mtVALIDATOR_LIST_COLLECTION, mtREPLAY_DELTA_RESPONSE, mtTRANSACTIONS. Excludes high-frequency control messages (mtPING, mtVALIDATION, mtPROPOSE_LEDGER, mtSTATUS_CHANGE). If compressed size doesn't beat uncompressed minus 4-byte header overhead, fall back to uncompressed (bufferCompressed_ cleared, getBuffer() returns uncompressed).
HTTP Endpoints (served by OverlayImpl::processRequest)
/crawl— JSON topology, gated by bitmask config (CrawlOptions::Overlay|ServerInfo|ServerCounts|Unl)/health— three-tier status (200/503/500) — HTTP status encodes result so LBs need no JSON parsing/vl/<key>or/vl/<version>/<key>— signed validator list
Concurrency Notes
OverlayImpl::mutex_isstd::recursive_mutex(acknowledged tech debt:// VFALCO use std::mutex). Recursion stems fromrun()triggering callbacks back into overlay.cond_iscondition_variable_any(needs Lockable, not BasicLockable) for shutdown drainwork_(executor_work_guardasstd::optional) keepsio_contextalive;reset()duringstop()lets queue drain- Strand vs mutex: peer registry mutations use
mutex_; timer/squelch/tx-metrics work uses strand OverlayImpl::Childregistration: destructor auto-removes fromlist_;stopChildren()copies pointers before iterating to avoid invalidationPeerImpfield locks:recentLock_(ledger state, latency),nameMutex_(shared_mutexforname_); strand-confined fields need no lockTxMetricshas its ownstd::mutex; writers additionally serialize via overlay strand; RPC readers calljson()directly without going through strandCluster::mutex_is non-recursive —for_eachcallback must not callupdate()PeerReservationTable:list()deliberately releases lock beforestd::sortto minimize hold time (snapshot sort pattern)
Cluster Registry
Cluster owns a std::set<ClusterNode, Comparator>. The Comparator is is_transparent — enables find(PublicKey) without constructing a dummy ClusterNode. update() enforces monotonic time (rejects stale reports), preserves names across nameless gossip updates, and uses erase+emplace_hint for O(1) amortized reinsert. load() is fail-fast on malformed lines (returns false) but tolerates duplicates with a warning (first entry wins).
PeerSet (Data Acquisition)
PeerSet / PeerSetImpl manages the working set of peers queried for a single in-flight data acquisition (ledger, tx-set, etc.). Uses scored peer selection (Peer::getScore(hasItem)) sorted descending. peers_ set of Peer::id_t acts as exclusion list — same peer is never re-added across retries. DummyPeerSet via make_DummyPeerSet() is the null-object used when loadOldLedger() needs InboundLedger without live peers.
Key Files
Overlay core
src/xrpld/overlay/Overlay.h/detail/OverlayImpl.{h,cpp}— main managersrc/xrpld/overlay/Peer.h/detail/PeerImp.{h,cpp}— per-peer logicsrc/xrpld/overlay/Message.h/detail/Message.cpp— wire envelope, lazy compressionsrc/xrpld/overlay/detail/ConnectAttempt.{h,cpp}— outbound connection state machinesrc/xrpld/overlay/detail/Handshake.{h,cpp}— handshake crypto, feature negotiationsrc/xrpld/overlay/detail/ProtocolMessage.h— wire framing, dispatchsrc/xrpld/overlay/detail/ProtocolVersion.{h,cpp}—XRPL/x.ynegotiationsrc/xrpld/overlay/detail/ZeroCopyStream.h— protobuf/Asio buffer adapterssrc/xrpld/overlay/detail/Tuning.h— all overlay magic numberssrc/xrpld/overlay/make_Overlay.h— factory +setup_Overlay(parses[overlay],[crawl],[vl],[network_id])src/xrpld/overlay/predicates.h— composable peer-selection/dispatch functors forOverlay::foreach
Reduce-relay
src/xrpld/overlay/Slot.h— per-validator state machine + selection algorithmsrc/xrpld/overlay/Squelch.h— per-peer suppression enforcementsrc/xrpld/overlay/ReduceRelayCommon.h— all reduce-relay constants
Telemetry
src/xrpld/overlay/detail/TrafficCount.{h,cpp}— per-category byte/message counterssrc/xrpld/overlay/detail/TxMetrics.{h,cpp}— rolling averages for tx reduce-relay
Cluster
src/xrpld/overlay/Cluster.h/ClusterNode.h/detail/Cluster.cpp— trusted-node registry with heterogeneous lookup
PeerSet (data acquisition)
src/xrpld/overlay/PeerSet.h/detail/PeerSet.cpp— scored peer selection for InboundLedger etc.
Reservations
src/xrpld/overlay/detail/PeerReservationTable.cpp— persistent allowlist via SQLite
Compression
src/xrpld/overlay/Compression.h—Algorithmenum, dispatch wrappers, wire-format constants
PeerFinder
src/xrpld/peerfinder/PeerfinderManager.h/Slot.h/make_Manager.h— public interfacesrc/xrpld/peerfinder/detail/Logic.h— central decision enginesrc/xrpld/peerfinder/detail/Livecache.h/Bootcache.{h,cpp}— address cachessrc/xrpld/peerfinder/detail/Checker.h— async reachability probersrc/xrpld/peerfinder/detail/SlotImp.{h,cpp}— slot state machinesrc/xrpld/peerfinder/detail/Counts.h— slot bookkeepingsrc/xrpld/peerfinder/detail/Fixed.h— Fibonacci backoffsrc/xrpld/peerfinder/detail/Handouts.h— fair distribution algorithmsrc/xrpld/peerfinder/detail/StoreSqdb.h— SQLite persistence (schema v4, migration handlesDROP COLUMNvia table rename)src/xrpld/peerfinder/detail/Source.h/SourceStrings.{h,cpp}— abstract + static-string address sourcesrc/xrpld/peerfinder/detail/Tuning.h— peerfinder magic numberssrc/xrpld/peerfinder/detail/iosformat.h—leftwstream manipulator for log alignment