Documents how to measure rippled's OpenTelemetry overhead via the
ripple/perf-iac comparison pipeline (telemetry on vs off, identical load,
eBPF profile diff), the exact trigger command, the option to dispatch from
rippled CI (and its cross-org token blocker), and the lessons learned
(gitignored role tasks file, sequential matrix legs).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flip the conan default_options telemetry flag to False on this baseline
branch so a perf-iac comparison run (this branch as baseline vs the
phase-10 telemetry-on branch as on-demand) measures the full linked-in +
hot-path cost of OpenTelemetry under an identical payment workload.
The flag propagates through generate() into the CMake toolchain with no
-o override, matching perf-iac's plain `conan install` invocation
(verified: toolchain emits `set(telemetry "False")`).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The txq.enqueue span sets tx_hash, tx_type, and txq_status on every
code path (TxQ.cpp:746/748/751), but fee_level_paid and
required_fee_level are set only on the fee-evaluated path
(TxQ.cpp:895-898), which is reached after the rejected and
applied_direct early exits. They are therefore not guaranteed on every
txq.enqueue span, so requiring them caused the validation harness to
fail whenever a txq.enqueue span took an early-exit path.
Remove the two conditional attributes from required_attributes and
document why in the span note.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Several close-time panels showed raw codes/booleans and unreadable
time-bar-charts. Relabel and re-visualize them so each reads clearly:
- Close-Time Proposal Spread (was "Close Time Bin Distribution"):
corrected a wrong description (it claimed per-node proposals; the
metric is the number of DISTINCT close-time positions peers proposed
per round, rawCloseTimes.peers.size()). Converted the time-barchart
(unreadable timestamp axis) to a horizontal bar gauge summing rounds
per distinct-position count.
- Consensus Outcome Distribution: renameByRegex maps the raw
consensus_state codes to human labels (yes->Agreed,
moved_on->Moved On (partial), expired->Expired (timeout), no->No
Consensus); value mappings alone do not relabel pie legends.
- Close-Time Agreement Rate (was "Close Time Agreement"): legend
relabelled from "close_time_correct=true/false" to Agreed/Disagreed.
- Close-Time Resolution Change (was "Close Time Resolution Direction"):
converted to a bar gauge; increased/decreased/unchanged relabelled to
Coarser (more disagreement) / Finer (better agreement) / Steady.
All four verified by rendering the panels to PNG via the Grafana
image renderer.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
"Close Time: Raw Proposals" and "Close Time: Effective / Quantized"
plotted the absolute close time (Ripple-epoch seconds), which is an
ever-rising line with no analytical value. There is no clean way to make
them useful: the values are Ripple-epoch seconds (not Unix ms, so date
units misrender), TraceQL metrics cannot do the epoch offset or an
inter-ledger-gap subtraction in-query, and Grafana calculateField
transforms break on these grouped Tempo metric frames (verified by
render: "No data").
Remove both. The useful consensus-timing signals are already covered:
time-to-consensus (round_time_ms), rounds per ledger (establish_count),
previous round time, and the close-time vote-bins / resolution-direction
/ bin-distribution panels remain. Re-tile and re-id (22 panels).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
With validation now passing 133/133, the only remaining job failure was the
regression gate flagging 4 timing "regressions". Two compounding causes:
1. Stale baseline: the committed baseline was captured (2026-04-24) under the
old, lighter workload — before the new txq-burst phase (60 TPS) existed. The
heavier per-ledger work genuinely raises ledger.build / tx.apply /
ledger.validate / acceptLedger timings, so every run regressed against it.
Refreshed the baseline from the latest CI-measured timings (same workload).
2. Histogram quantization: SpanMetrics latency buckets are
[1,5,10,25,...]ms, so a sub-millisecond quantile near a low-end boundary can
jump a full bucket (1ms->5ms) between runs with no real change. The old
absolute bounds (2-5ms) were narrower than one bucket width, so that jitter
tripped the gate. Widened the default span bounds to 10-15ms (~2 low-end
buckets) and pct to 50%, and the job_queue running bound to 20ms, to tolerate
quantization noise while still catching genuine multi-bucket regressions. The
consensus.* overrides (tight pct, large abs) are unchanged.
The refreshed baseline also picks up real rpc.ws_message timings (previously null
under the phantom rpc.request key).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Number of rounds is an integer, but avg_over_time(establish_count)
produced fractional values (2.1). Switch the Rounds panel to
count_over_time() by (span.establish_count): one integer series per
round count (1/2/3...), showing how many ledgers needed that many
establish rounds — the meaningful distribution, inherently integer
(decimals=0).
Apply dashboard rule 9: panels with a right-side table legend take full
width. Widen "Consensus Rounds per Ledger" and "Consensus Outcome
Distribution" to w=24 and re-tile the dashboard.
Verified via the Grafana proxy: rounds=2 dominates (~11-12 ledgers),
rounds=3 occasional.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 9 introduces the ledger.acquire span (InboundLedger fetch) that phases 7-8
do not have, so the forward-merged 09-data-collection-reference inventory is
extended here:
- §1.1: add ledger.acquire to the Ledger span table.
- §1.2: add its attributes (acquire_reason, timeouts, peer_count, outcome) and
note it also sets ledger_seq; bump the span count.
Also fix two stale StatsD metric references in the Peer Quality dashboard
(xrpld-peer-quality.json): rippled_Peer_Finder_Active_{Inbound,Outbound}_Peers
-> xrpld_Peer_Finder_* to match the xrpld_ metric prefix the rest of the stack uses.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The consensus duration panels plotted span wall-clock
(traces_span_metrics_duration_milliseconds), which is ~3-8 ms of
instrumentation overhead, not the real consensus time (~3000 ms). And
the close-time value panels plotted an ever-rising absolute epoch line.
Rework them to answer the actual operational questions, all from
attributes that already exist on the consensus spans:
- Time to Reach Consensus (p50/p95) and Average Time to Reach Consensus:
round_time_ms on consensus.accept — the wall-clock to agree a ledger.
- Consensus Rounds per Ledger (Establish Count): avg and max of
establish_count on consensus.establish — how many proposal rounds it
took to converge (1 = first proposal).
- Previous Round Time per Ledger: previous_round_time_ms on
consensus.round.
Reorder the dashboard into an investigation flow: health/throughput ->
time-to-consensus and rounds -> ledger close/apply timing -> close-time
detail -> failures/mode/mismatch. Assign stable sequential panel ids.
Verified each query returns data via the Grafana datasource proxy
(p95 ~4096 ms, avg ~2825 ms, rounds ~2).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The §1 span and attribute inventory had regressed to an older 16-span snapshot
that uses the pre-2026-05-13 dotted attribute keys, while phase-7's code emits
~36 spans with bare/underscore attribute keys. The §Data Flow Overview and §2
System Metrics sections (native OTLP transport — phase-7's migration) were already
correct and are left unchanged.
- §1.1: expand the span inventory to the full surface — add gRPC (grpc.<MethodName>),
TxQ (txq.*), PathFind (pathfind.*), and the full consensus set (round/phase.open/
establish/update_positions/check/mode_change/proposal.receive/validation.receive).
Fix the phantom rpc.request -> rpc.http_request, add rpc.ws_upgrade. No grpc.request,
no pathfind.rank, no ledger.acquire (the latter is added in phase-9, not yet present here).
- §1.2: convert every span-attribute key from dotted xrpl.<domain>.<field> to the
bare/underscore form. The sole span-attr dotted exception is xrpl.ledger.hash on
peer.validation.receive (shared constant); consensus.validation.send uses bare
ledger_hash. Resource attrs xrpl.network.id/type stay dotted. Fix tx_count/tx_failed
placement (on tx.apply, not ledger.build). Add attribute tables for the new families.
- §1.3: list the full set of spanmetrics dimension labels (bare keys, from the collector
config) instead of the stale xrpl_rpc_command-style names.
- §4/§5: convert Tempo TraceQL and PromQL examples to the bare attribute/label forms.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The print-env CI fix let the Telemetry Stack Validation job build and run the
workload harness end-to-end for the first time. It reported 129/136 checks
passing; this commit fixes the 7 real failures plus a latent regression-gate bug.
Validation-suite fixes (verified against the CI run's actual emission + live node):
- expected_metrics.json: the beast::insight job-depth gauge is `xrpld_jobq_job_count`,
not `xrpld_job_count` (the latter is a Phase 9 OTel counter). Reverted the prior
rename. Removed the statsd_histograms block (`xrpld_rpc_time`/`xrpld_rpc_size`):
these RPC timers do not emit under the WS workload (0 series in CI).
- expected_spans.json: `tx_status` is only set on suppressed/known-bad receives, so
it is no longer a required attribute of every `tx.receive`. Marked `pathfind.compute`
and `pathfind.discover` optional and the `pathfind.request -> pathfind.compute`
hierarchy as skip — the self-to-self XRP probe returns before computing paths in a
fresh cluster with no liquidity, so only `pathfind.request` fires.
Regression-gate bug (telemetry-validation.yml "Print regression summary"):
- `jq -e` exits non-zero when its filter result is boolean false — the normal case
for a populated (non-placeholder) baseline — which was misreported as
"Failed to parse baseline JSON" and failed the job. Dropped `-e` (kept `-r`) so a
non-zero exit genuinely means malformed JSON.
The optional-span handling and regression comparison both worked correctly in the
CI run (txq.* / pathfind.update_all skipped-when-absent, 0 regressions detected).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apply the dashboard guidelines to the five close-time panels:
- Axis labels (Title Case) on every panel: "Close Time (Ripple
Seconds)" for the value panels, "Count / Milliseconds" for vote
bins/resolution, "Rounds in Window" for the count panels.
- Human-readable legends with the dimension in brackets per the legend
convention: "Raw Close Time [{{resource.service.instance.id}}]",
"Effective Close Time [...]", "Resolution Direction
[{{span.resolution_direction}}]", "{{span.close_time_vote_bins}} Vote
Bins" — replacing the bare label tokens.
- Unit "none" (plain number): the close-time values are Ripple-epoch
seconds and TraceQL metrics cannot offset them to a wall-clock unit,
and the others are counts/ms on a shared axis.
Verified rendered values against raw spans: close times ~833,998,8xx,
resolution 10000 ms, vote bins 1/2/3 — all correct.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TraceQL metrics queries default to a 3h max range
(query_frontend.metrics.max_duration), so a dashboard set to a longer
window failed with "range ... exceeds 3h0m0s". Add a query_frontend
block raising it to 168h, matching the search max_duration, so the
consensus close-time panels work at 6h/12h/24h ranges.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Raw Proposals and Effective/Quantized panels rendered nameless
series: their legendFormat used {{service.instance.id}}, but the
TraceQL metrics query groups by resource.service.instance.id and Tempo
returns that full key as the series label. The legend token did not
match any label, so each series showed blank. Use the matching
{{resource.service.instance.id}} token.
Verified via the Grafana datasource proxy that all six close-time panels
now return correctly-labelled series.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Raw Proposals and Effective/Quantized panels showed wrong values
(e.g. 759M, 852M, even 0) against a true value of ~834M. Cause:
quantile_over_time bucketizes into an exponential histogram tuned for
duration distributions, so it cannot represent large absolute integers
(Ripple-epoch seconds) accurately.
Switch both panels to avg_over_time, which returns the correct value
(verified ~833,996,7xx matching the raw span attribute). Average is also
the semantically right aggregation here: close time is a single agreed
value per consensus round, not a latency distribution, so a median was
never meaningful.
Set the unit to none rather than seconds: the value is Ripple-epoch
seconds (Unix = value + 946684800) and TraceQL metrics cannot do the
offset arithmetic in-query, so a duration unit would misrender it.
Clarify in the description that the absolute level tracks wall-clock and
the useful signal is per-node spread / raw-vs-effective gap.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The five Close Time panels still rendered "No Data" after the metrics
rewrite. Root cause: each query carried
`span.close_time_correct=~"$close_time_correct"`, but close_time_correct
is a boolean span attribute and TraceQL's regex match (=~) against a
bool matches nothing in a metrics query, so every panel returned an
empty series set (HTTP 200, {"series":[]}).
Remove that filter clause. The panels do not break down by
close_time_correct, so dropping it restores data without losing any
dimension. The $node filter (a string attribute) is unaffected and
stays.
Verified via the Grafana datasource proxy that all six targets now
return series.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the two previously-registered-but-never-incremented validation
counters to ValidationTracker's gross lifetime tallies, exported as
monotonic ObservableCounters. New gross atomics count each ledger once at
first classification and are never adjusted on late repair, keeping the
_total counters monotonic and additive (agreements_total + missed_total ==
ledgers reconciled); the repair-aware windowed view stays on the existing
xrpld_validation_agreement gauge. The validator-health dashboard panels
that already query these names now render data instead of "No data".
Also de-stale 09-data-collection-reference.md: §5b documented flat metric
names (xrpld_cache_SLE_hit_rate, ...) that the code never emits — it emits
labeled gauges (xrpld_cache_metrics{metric="SLE_hit_rate"}). Replace the
stale flat-name tables with a pointer to the canonical labeled section,
reconcile the contradictory headline counts, and correct xrpld_job_count
to its real exported name xrpld_jobq_job_count.
Adds two GTests asserting gross tallies stay frozen on repair while net
totals move, plus the additive invariant.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Phase 10 validation harness had drifted from the code's recording surface
and the telemetry-validation CI job was failing before it could build.
CI fix (telemetry-validation.yml):
- Replace nonexistent local action ./.github/actions/print-env with the remote
XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this).
- Sync prepare-runner and upload-artifact action SHAs to the canonical workflow.
Recording-surface reconciliation (docker/telemetry/workload/):
- Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore
form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id,
ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...).
Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared
constant), while consensus.validation.send uses bare ledger_hash.
- Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq);
ledger.build carries ledger_seq/close_* (not tx_count/tx_failed).
- Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop
the never-emitted duration_ms; rebuild the parent-child map accordingly.
- Add the new spans the code emits: apply-pipeline stage spans
(tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq.*,
consensus sub-spans (round/establish/update_positions/check/phase.open),
ledger.acquire, grpc.*, pathfind.*. Conditional spans are marked optional so
they are skipped (not failed) when the workload does not exercise them.
- validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix
PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span
attrs); add optional-span handling that skips missing optional spans while still
validating attributes when present.
- expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics,
xrpld_job_count, the 15 on-disk xrpld-* dashboard UIDs, and the real bare
spanmetrics dimension labels.
- regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message.
Metrics pipeline fix:
- Switch node [insight] config from server=statsd/prefix=rippled to server=otel +
/v1/metrics endpoint + prefix=xrpld across run-full-validation.sh,
xrpld-validator.cfg.template, benchmark.sh and the workload compose. The
collector has no StatsD receiver, so system metrics only reach Prometheus over
OTLP.
Synthetic load for new spans:
- Add ripple_path_find to the RPC load generator (drives pathfind.* spans).
- Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.*).
All facts verified against the *SpanNames.h headers and a live xrpld node +
collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type],
279 xrpld_ Prometheus metrics and zero rippled_).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The five Close Time panels (Raw Proposals, Effective/Quantized, Vote
Bins & Resolution, Resolution Direction, Bin Distribution) rendered
empty: they used TraceQL `| select(attr)`, which returns a trace list
that a timeseries/barchart panel cannot plot.
Enable TraceQL metrics in Tempo and rewrite the panels to use it:
- tempo.yaml: add the local-blocks processor to the metrics generator so
recent blocks are queryable via /api/metrics/query_range. Set
filter_server_spans=false because the consensus spans are
SPAN_KIND_INTERNAL (the default keeps only server spans, so attribute
aggregations over internal spans returned nothing), and
flush_to_storage=true with a traces_storage path so query_range can
read the flushed blocks.
- consensus-health.json: replace each panel's select() with a metrics
query — quantile_over_time on the integer close-time attributes,
avg_over_time for vote bins / resolution, and count_over_time by the
resolution_direction and vote-bin dimensions. Set the raw/effective
panels' unit to seconds (the values are Ripple-epoch seconds, which
dateTimeFromNow rendered with the wrong epoch).
Verified the query forms compile and return series against live internal
spans; the close-time series populate once the node reaches full sync.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Current Job Latency gauge sat at the bottom of the Job Queue
Analysis dashboard; per the dashboard guideline gauges belong at the
top. Move it to the first row and reflow the remaining panels below it.
Also assign explicit sequential panel ids so deep links stay stable.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Correct the width rule from the previous layout commit. Full width
(w=24) is now applied ONLY to timeseries panels whose legend is a
right-side table, since those legends need the horizontal room. Panels
with default/bottom legends, pie charts, and the heatmap return to half
width. This narrows "Transaction Receive vs Suppressed" and "TxQ
Enqueue Rate by Transaction Type", which were wrongly widened.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make transaction-overview deep-links stable and improve readability:
- Assign explicit sequential panel ids (1..20) so viewPanel=panel-N
URLs stay pinned to the same chart across edits. Previously ids were
unset and Grafana auto-assigned them by array position, so any
reorder silently repointed bookmarks.
- Move the single-value stat panel (Transaction Apply Failed Rate) to
the top row.
- Lay out in three topic sections (Processing, Apply Pipeline, Queue).
Within each, timeseries with a breakdown dimension (tx_type, stage,
ter_result, suppressed) take full width so their right-side table
legends are readable; single-series panels, pie charts, and the
heatmap stay half-width and pair up.
All six template variables already default to All (includeAll + multi);
no change needed there.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply the dashboard legend convention across all panels now that the
P50 series have been removed (P95-only):
- Drop the redundant "P95 " / "P50 " prefix; the panel title already
states the percentile.
- Put every filter/dimension value inside [] comma-separated, ending
with exported_instance, e.g. "AMMDeposit [Preclaim, xrpld-mainnet]".
- Add exported_instance to the by() clause and legend of the three
panels that filtered on $node but omitted it (Transaction Rate by
Type, Transaction Results by Type, TxQ Accept Status), so per-node
series are produced.
- Title-case the stage value for display via label_replace in the four
apply-pipeline panels; the span attribute stays lowercase
(preflight/preclaim/apply) since legendFormat cannot change case.
- Cap tooltip maxHeight at 500 on every panel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The existing apply-pipeline panels show latency by stage (all types
combined) or by type (single span). Neither answers "for a given
transaction type, which stage dominates its latency". Add a p95 panel
grouped by both tx_type and stage, filterable via the $tx_type and
$stage variables. Both dimensions already exist in spanmetrics, so no
collector change is needed. Reflow the section so the full-width
failure panel sits below the new full-width panel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim,
tx.transactor) added on phase-3 through the observability stack so the
spanmetrics connector produces per-stage RED metrics without any native
instruments.
- collector: add the `stage` dimension to the spanmetrics connector so
the three stages split into separate metric series (3 bounded values).
- dashboard: add a "Tx Apply Pipeline" section to transaction-overview
with rate, p95 latency, and failure-rate panels grouped by stage, plus
a `stage` template variable. Panels follow the existing config (node
filter, exported_instance legends, Title Case, axis labels).
- The failure panel filters ter_result != tesSUCCESS rather than span
status, because a failing ter code completes the span normally — only
thrown exceptions set an error status. This matches the existing
"Transaction Results by Type" panel convention.
- docs: document the spans, attributes, and stage dimension in the data
collection reference and runbook, including the sampling caveat that
span-derived metrics inherit tracer head-sampling and undercount at
sampling_ratio < 1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The tx.transactor span covered only the apply stage; preflight and
preclaim had no telemetry, so a transaction that hard-failed those
stages produced no apply-pipeline span and per-stage latency/failure
was invisible.
Add tx.preflight and tx.preclaim spans in applySteps.cpp via a
makeStageSpan() helper using SpanGuard::hashSpan, so all three stages
share a deterministic trace_id derived from txID[0:16] even though they
run sequentially and often cross-thread. Each span carries stage,
tx_type, and ter_result; exceptions are recorded as tefEXCEPTION before
the public wrappers map them. The type lookup is guarded behind the
span-active check so it costs nothing when tracing is off.
Add a stage="apply" attribute to the tx.transactor span and move its
three hardcoded attribute strings to a new library-safe header
include/xrpl/tx/detail/TxApplySpanNames.h, which mirrors the daemon-side
TxSpanNames.h strings so the collector spanmetrics connector aggregates
both span sets under one dimension set.
A constants-contract test pins the span-name, attribute-key, and
stage-value strings; span content stays covered by the docker
integration test, as the rest of the telemetry suite is.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reduced from 30 to 7 filters: service.instance.id, name, status, command,
tx_hash, tx_type, ledger_hash. Full attribute inventory is in
OpenTelemetryPlan/09-data-collection-reference.md §4; TraceQL autocomplete
covers the rest.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
From the code-review pass:
- transaction-overview.json: the tx.process and tx.transactor latency-by-type
panels used lowercase legends (p95/p50) without the per-node dimension. Use
Title Case (P95/P50), add exported_instance to the by() clause, and include
[{{exported_instance}}] in the legend, per the dashboard legend convention.
- consensus-health.json: panel descriptions still referenced the old dotted
attribute names (xrpl.consensus.mode, xrpl.ledger.seq) after the A1 rename;
update them to the bare emitted names (consensus_mode, ledger_seq).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
handleMismatch() recorded the "consensus_txset" reason but then fell through
to the transaction-level comparison, which also recorded a reason
("same_txset_diff_result" / "different_txset"). A single mismatch with
disagreeing consensus tx-set hashes therefore incremented
xrpld_ledger_history_mismatch_total twice across two reason labels, so the
sum over reason exceeded the real mismatch count.
The consensus tx-set hash disagreement is the root cause; return after
recording it so each mismatch contributes exactly one reason.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The A1 fix (xrpl_consensus_mode -> consensus_mode) was applied on phase-6, but
the phase-8->phase-9 merge conflict resolution for consensus-health.json took
phase-9's pre-fix panel base, silently reintroducing all 11 stale
xrpl_consensus_mode label references (the spanmetrics label that is never
populated — see the original A1 commit).
Re-apply the label fix on phase-9: xrpl_consensus_mode -> consensus_mode in
every panel expr, legendFormat, and the $consensus_mode template variable's
label_values() query. The Grafana variable name $consensus_mode is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Visualise the metrics added in this series:
- consensus-health: "Ledger History Mismatch Rate by Reason"
(xrpld_ledger_history_mismatch_total by reason — fork diagnostics)
- fee-market: "Queue Abandonment Rate (Expired)" and "Queue Admission
Rejections (Dropped)" (xrpld_txq_expired_total / dropped_total)
- peer-network: "Reduce-Relay Peer Selection" and "Reduce-Relay Missing-Tx
Frequency" (xrpld_reduce_relay_metrics)
- system-node-health: "Ledger Acquire Duration" and "Ledger Acquire Rate by
Outcome" (ledger.acquire span)
otel-collector-config.yaml: add outcome and acquire_reason spanmetrics
dimensions so the ledger.acquire outcome breakdown populates.
All panels follow the existing template: $node filter, exported_instance in
legends, Title Case, axis labels.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the new synchronous counters (ledger_history_mismatch_total{reason},
txq_expired_total, txq_dropped_total{reason}) and the reduce-relay observable
gauge to the ASCII ownership diagram in the MetricsRegistry header so the
documented instrument inventory matches the code.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The transaction reduce-relay subsystem (selected vs suppressed peers,
feature-disabled peers, missing-tx frequency) was computed in OverlayImpl's
TxMetrics but only surfaced via the get_counts JSON RPC — invisible to
Prometheus/Grafana, despite being the central efficiency KPI for the feature.
Add an observable gauge xrpld_reduce_relay_metrics{metric} that reads
Overlay::txMetrics() and parses its rolling-average fields:
- selected_peers (txr_selected_cnt)
- suppressed_peers (txr_suppressed_cnt)
- not_enabled_peers (txr_not_enabled_cnt)
- missing_tx_freq (txr_missing_tx_freq)
The JSON values are decimal strings (std::to_string), parsed via std::stoll —
the same JSON-reading pattern as registerNodeStoreGauge. No new Overlay
accessor or core-interface change required.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
InboundLedger drives ledger back-fill and fork recovery with timeout/retry
logic (kLedgerTimeoutRetriesMax = 6), but emitted only a global ledger_fetches
counter — sync/recovery cost was a telemetry blind spot.
Add a ledger.acquire span that wraps the acquisition lifecycle:
- Started in InboundLedger::init() with ledger_seq and acquire_reason
(history / consensus / generic, mirroring InboundLedger::Reason).
- Finalized in InboundLedger::done() with outcome (complete / failed),
timeouts, and peer_count, then reset so the span duration is exported.
Held as a std::optional<SpanGuard> member (same pattern as RCLConsensus
roundSpan_). New op/attr/val constants added to LedgerSpanNames.h. Compiles to
a no-op when telemetry is disabled via the SpanGuard fallback.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The transaction queue had no metric for demand that leaves or never enters the
queue, so fee-underpayment abandonment and admission-control rejection were
invisible (distinct from jq_trans_overflow, which is the job queue).
Add two synchronous counters via MetricsRegistry:
- xrpld_txq_expired_total — incremented in TxQ::processClosedLedger() for each
queued transaction removed because its LastLedgerSequence passed (submitters
who under-bid the escalating fee and were never included).
- xrpld_txq_dropped_total{reason} — incremented in TxQ::apply() at the
queue-full admission-control returns (reason="queue_full").
Both reach MetricsRegistry via the Application& parameter already passed to
these methods; calls are null-guarded so they no-op when telemetry is disabled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LedgerHistory::handleMismatch() already classifies a built-vs-validated ledger
mismatch (prior ledger, close time, consensus tx set, same/different tx set),
but only bumped a single untyped beast::insight counter — the reason was
dropped. Fork diagnosis was therefore a log-grep exercise.
Add a labeled OTel counter so the mismatch reason is a queryable time series:
- MetricsRegistry: new ledgerHistoryMismatchCounter_ + incrementLedgerHistoryMismatch(reason)
- LedgerHistory: record one reason per classification branch (unknown,
prior_ledger, close_time, consensus_txset, same_txset_diff_result,
different_txset). Reaches MetricsRegistry via the existing app_ reference.
The existing beast::insight mismatchCounter_ is left intact.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xrpld_peer_quality{metric="peers_insane_count"} gauge was hardcoded to 0.0
with a TODO, leaving the "Insane/Diverged Peers" panel permanently empty.
PeerImp::json() already exposes the peer's tracking state via the "track"
field (set to "diverged" when tracking_ == Tracking::Diverged). The peer-quality
callback already iterates peer->json() for latency and version, so count peers
whose "track" field equals "diverged" in the same loop — no change to the
abstract Peer interface required.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A phase-8->phase-9 merge (a675897aaf) duplicated the "Consensus Outcome
Distribution" and "Consensus Failures Over Time" panels: both appeared twice
with byte-identical queries (verified ignoring gridPos). The pair existed once
on phase-6/7/8 and became two on phase-9 only, so the duplication originated
in phase-9's own merge history.
Remove the second (lower) copy of each and re-stack panel y-positions with no
gaps. The single retained copy keeps the original y=64 row.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both metrics are already emitted and live in Prometheus but were not fully
visualised.
- Fee Market (xrpld-fee-market.json): "Load Factor Attribution (Stacked
Components)" — stacks load_factor_fee_escalation / fee_queue / local / net /
cluster so an operator can see which component drives the effective fee. The
existing panels showed the aggregate only.
- Validator Health (xrpld-validator-health.json): "Agreement % (7d)" and
"Agreements vs Missed (7d)" — the xrpld_validation_agreement gauge already
observes agreement_pct_7d / agreements_7d / missed_7d, but the dashboard only
plotted 1h and 24h windows.
Panels follow the existing template: $node filter, exported_instance in legends,
Title Case, axis labels.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The grpc.{Method} spans (GRPCServer.cpp) and pathfind.* spans (PathRequest.cpp)
are emitted but had no dashboard coverage. The existing RPC & Pathfinding
dashboard only plotted StatsD timers. Add span-derived rows:
- gRPC Request Rate by Method (grpc.* by method)
- gRPC Latency P95 by Method
- gRPC Error Rate by Status (by grpc_status)
- Pathfinding Compute Duration (pathfind.compute p95/p50)
- Pathfinding Request & Discovery Rate (pathfind.request / pathfind.discover)
otel-collector-config.yaml: add method, grpc_role, grpc_status spanmetrics
dimensions (bounded value sets). Add a $grpc_method template variable so the
gRPC panels can be filtered by method, consistent with the dashboard filter
conventions.
Note: these spans populate only when the node serves gRPC / pathfinding
traffic; they are correct but not exercised by the current health-check
workload (they will be covered by the Phase 10 workload generator).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The consensus state-machine and TxQ lifecycle spans are emitted by the code
and present in Prometheus, but no panel visualised them. Add panels keyed on
those span_names (verified live) plus the low-cardinality dimensions needed to
break them down.
Consensus Health (consensus-health.json) — new rows:
- Consensus Round Duration (full round, p95/p50, mode-filterable)
- Consensus Phase Duration (open vs establish breakdown)
- Position Update Duration (update_positions p95/p50)
- Consensus Stall Rate (consensus.check by consensus_stalled)
- Consensus Mode-Change Rate by Target Mode (mode_change by mode_new)
Transaction Overview (transaction-overview.json) — new rows:
- TxQ Enqueue Rate by Transaction Type (txq.enqueue by tx_type)
- Queue Bypass Ratio (txq.apply_direct vs txq.enqueue)
- Queue Accept (Drain) Duration per Ledger (txq.accept p95/p50)
- Queue Cleanup Rate (txq.cleanup expired entries)
otel-collector-config.yaml — add spanmetrics dimensions for the lifecycle
breakdowns: mode_new, consensus_stalled, consensus_phase, consensus_result
(all bounded value sets, safe as Prometheus labels).
All new panels follow the existing dashboard template: $node filter,
exported_instance in every legend, Title Case, axis labels, row layout.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The spanmetrics connector dimension was `xrpl.consensus.mode`, but the code
emits the span attribute under the bare key `consensus_mode` (matching every
other dimension after the Phase 6 rename). The mismatch left the
`xrpl_consensus_mode` Prometheus label empty, so the Consensus Health
"Consensus Mode Over Time" panel and the `$consensus_mode` template variable
(which filters every panel) matched no live series.
- otel-collector-config.yaml: dimension `xrpl.consensus.mode` -> `consensus_mode`
- consensus-health.json: 11 label refs `xrpl_consensus_mode` -> `consensus_mode`
(the `$consensus_mode` Grafana variable name is unchanged)
- telemetry-runbook.md: refresh the stale spanmetrics label table to the bare
names actually emitted (command/rpc_status/consensus_mode/local/
proposal_trusted/validation_trusted), fix dotted->bare attribute names in
span tables and TraceQL examples (tx_hash, ledger_seq, consensus_round_id,
consensus_ledger_id, consensus_round, tx_id event attr), correct the
consensus_round_id query to int (not quoted string), and fix the
load_type value query ("exception_rpc" -> "exceptioned RPC").
Verified against the live stack: Tempo span tags confirm bare attribute keys
(consensus_mode, ledger_seq, tx_hash, ...); the populated xrpl_consensus_mode
series in Prometheus is stale retained data from an older build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
beast::insight metrics exported via OTLP carried no exported_instance
label because [insight] omitted service_instance_id (only [telemetry]
set it). Every system-* dashboard filters insight metrics with
exported_instance=~"$node", and the $node template variable is sourced
from label_values(..., exported_instance) — so with the label absent,
$node was empty and all insight-backed panels showed no data.
Add service_instance_id to [insight] in both telemetry configs, matching
the [telemetry] value (xrpld-mainnet / xrpld-devnet). CollectorManager
already reads this key and passes it to OTelCollector, which sets the
service.instance.id resource attribute.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The phase-9 NodeStore I/O totals, write-load/read-queue, read-threads,
and object instance-count panels rendered large cumulative values with
unit "none". Switch to "short" for readable abbreviation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Count and message-volume panels (operating-mode transitions, job queue
depth, network/overlay message totals, getobject message counts) used
unit "none", rendering large values as raw unscaled numbers. Switch to
"short" so Grafana abbreviates (e.g. 1.5 Mil) for readability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The system-* dashboards queried the legacy StatsD rippled_ prefix, but the
node now emits beast::insight metrics via native OTLP under the xrpld_
prefix (config: [insight] server=otel, prefix=xrpld). All queries returned
no data.
Migration (names derived from C++ beast::insight registrations, not live
Prometheus, since a syncing node does not emit every metric yet):
- rippled_ -> xrpld_ prefix across all panel queries and template variables
(including the $node variable query, which broke the whole dashboard filter)
- Histogram Event instruments export with unit ms, so bare _bucket becomes
_milliseconds_bucket: ios_latency, rpc_time, rpc_size, pathfind_fast/full
- Job-type metrics were StatsD summaries (label quantile="$quantile"); on the
OTLP path they are histograms. Converted those queries to
histogram_quantile($quantile, rate(xrpld_<job>_milliseconds_bucket[5m]))
and added the previously-undefined $quantile template variable
- Per-job-type detail panels: __name__ regex now matches _milliseconds_bucket
No panels removed. Panels for metrics not yet emitted (e.g. warn/drop, or
job types the syncing node has not run) show no data until the path executes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P100 from a histogram is degenerate — it always returns the upper bound of
the highest populated bucket (a single slow outlier pins it to the top
boundary), producing a flat line. Revert to meaningful quantiles:
- Job Queue Wait Time / Job Execution Time: p75 (typical) + p99 (tail)
- Per-Job-Type / Per-Method: p99
- Added gauge panels showing current p99 with green/yellow/red thresholds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The three duration heatmaps (transaction, consensus accept, RPC latency)
had an axisLabel of "Duration (ms)" but no unit code, so y-axis tick
values rendered unscaled. Set unit=ms on both the yAxis options and
panel defaults so buckets display as proper millisecond values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grafana does not recognize 'us' as a unit code, so microsecond values
rendered as raw numbers with a plain 'us' suffix (no scaling). The
correct code is 'µs'. Affects job-queue and OTel RPC latency panels
backed by *_duration_us histograms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds template variables $tx_type, $ter_result, $txq_status to the
Transaction Overview dashboard. All relevant panels now respect these
filters, enabling operators to drill into specific transaction types
or result codes.
Changes:
- Panel 2 renamed to "Transaction Processing Latency by Type" (now
shows p95/p50 per tx_type instead of aggregate)
- Panels 1,3,4,5,7,9,12 filter by $tx_type
- Panel 10 filters by $tx_type and $ter_result
- Panel 11 filters by $txq_status
- Removed redundant "TX Processing Latency by Type (p95)" panel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds template variables $tx_type, $ter_result, $txq_status to the
Transaction Overview dashboard. All relevant panels now respect these
filters, enabling operators to drill into specific transaction types
or result codes.
Changes:
- Panel 2 renamed to "Transaction Processing Latency by Type" (now
shows p95/p50 per tx_type instead of aggregate)
- Panels 1,3,4,5,7,9,12 filter by $tx_type
- Panel 10 filters by $tx_type and $ter_result
- Panel 11 filters by $txq_status
- Removed redundant "TX Processing Latency by Type (p95)" panel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shows p95 latency of tx.process span broken down by tx_type. Works for
both received and locally-processed transactions, unlike the tx.transactor
panel which requires the node to be synced and applying.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OTel C++ SDK's SetAttribute appends rather than overwrites on
in-flight spans. Setting suppressed=false as a default then overriding
to true resulted in both values appearing in the exported span.
Fix: remove the default-false set, place suppressed=false once after
the HashRouter check passes (non-suppressed path), and suppressed=true
remains only in the suppressed path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve consensus dashboard conflict and remove duplicate
consensus_state dimension in collector config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collector config: add tx_type, ter_result, txq_status, consensus_state,
load_type, is_batch as spanmetrics dimensions so they appear as
Prometheus labels for dashboard queries.
New dashboard panels:
- Transaction Overview: Rate by Type, Results by Type, TxQ Status (pie),
Transactor Duration p95 by Type
- Consensus Health: Outcome Distribution (pie), Failures Over Time
- RPC Performance: Resource Cost by Command, Batch vs Single
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve XrplCore.cmake conflict: keep telemetry module before tx module
(correct ordering from the cascade).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds comprehensive "Insights and Sample Queries" section showing operators
what questions they can answer with the newly-added span attributes:
- Transaction workflow analysis (filter by tx_type, fee, ter_result)
- TxQ health (txq_status, ledger_changed)
- RPC debugging (is_batch, request_payload_size, load_type)
- PathFinding performance (dest_currency, num_source_assets)
- Consensus health (consensus_state, is_bow_out, disputes_count)
- Cross-subsystem correlation examples
Also updates all span reference tables with the new attributes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds workflow-critical attributes to consensus spans:
- consensus.proposal.send: is_bow_out (identifies resignation proposals)
- consensus.accept: consensus_state (yes/moved_on/expired), disputes_count
- consensus.validation.send: ledger_hash (correlates validation to ledger)
Enables answering: "Did we reach consensus or time out?", "How many
disputes existed at acceptance?", "Which ledger did we validate?"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The apply() function doesn't have a `using namespace telemetry` directive
(unlike processTransaction), so tx_span attrs need explicit qualification.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enriches the tx.process span with final outcome after batch application:
- ter_result: the TER code string (e.g., "tesSUCCESS", "tecPATH_DRY")
- applied: boolean whether the transaction was included in the ledger
These attributes complete the tx.process span lifecycle — it now captures
identity (tx_type, tx_hash), intent (fee, sequence), and outcome
(ter_result, applied) for full workflow traceability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire up span attributes that enable filtering/grouping traces by request
characteristics: batch detection, payload size, resource cost category,
command name on WS spans, and pathfinding search parameters (destination
amount/currency, source asset count).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The check-rename CI job requires all rename scripts to have been run.
The telemetry config files had 'prefix=rippled' which should be 'xrpld'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The project-wide rename check changed the factory function name but
missed this call site in the consensus simulation test framework.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ServiceRegistry gained the pure virtual getMetricsRegistry() in phase 7
but TestServiceRegistry was never updated. Returns nullptr since tests
don't need a real MetricsRegistry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clang-tidy misc-include-cleaner requires each symbol to be provided by
a directly-included header. Replace the convenience trace/context.h
(which only provides GetSpan/SetSpan) with the specific headers for
kSpanKey, holds_alternative, get, shared_ptr, Span, and span.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two stray "rippled" tokens introduced by 43258e8d ("docs(telemetry):
add secure-OTel pipeline analysis…") were caught by check-rename in
CI. Re-run docs.sh to convert them to xrpld so the rename check
passes on PR #6425 (and downstream PR #6426 once merged up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in pratik/otel-phase3-tx-tracing's two commits that move all
consensus-tracing content off phase-3:
- c9521b97fe refactor(telemetry): pull consensus-tracing scope-leak
out of phase-3
- d6b101069e refactor(telemetry): remove consensus tracing from
phase-3
Phase-4 already owns this content (with the consensus_span -> cons_span
namespace rename, round/accept/proposalSend/validationSend span
construction, and the relocated ConsensusSpanNames.h under
src/xrpld/consensus/), plus its own evolution on top — so the merge is
resolved as a tree-identity merge:
- src/xrpld/app/consensus/ConsensusSpanNames.h: keep phase-4's
renamed/expanded version (modify/delete conflict, resolved with
--ours).
- src/xrpld/app/consensus/RCLConsensus.cpp: keep phase-4's version
with cons_span attribute calls, the trace_context inject blocks
on broadcastPropose/validate, the secure-OTel TODO, and the
full validation/round span instrumentation (content conflict,
resolved with --ours).
- src/xrpld/overlay/detail/PeerImp.cpp: keep phase-4's version with
the proposalReceiveSpan/validationReceiveSpan calls, lambda span
captures, and cons_span::attr::* setAttribute calls (content
conflict, resolved with --ours).
- src/xrpld/telemetry/ConsensusReceiveTracing.h: phase-3 deleted it,
phase-4 still uses it. Restored from phase-4 HEAD (silent
auto-deletion otherwise).
- include/xrpl/telemetry/TraceContextPropagator.h: phase-3 stripped
consensus references and the secure-OTel TODO; phase-4 still has
both. Restored from phase-4 HEAD.
- src/xrpld/telemetry/PropagationHelpers.h: phase-3 swapped the
@see ConsensusReceiveTracing.h cross-reference for TxTracing.h;
phase-4 still wants the consensus reference. Restored from
phase-4 HEAD.
Net tree change on phase-4: zero. Verified via 'git diff <pre-merge-sha>
HEAD' returning empty.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Phase-3 (PR #6425) is scoped to transaction tracing only; consensus
tracing belongs to phase-4 (PR #6426). The previous commit on this
branch removed the namespace/attribute scaffolding c6c019ed8b leaked
into phase-3, but phase-3 still carried the consensus span construction
and trace-context propagation introduced in earlier commits
(61cb1faf8f, 93bed03d8d). Move that out too so phase-3 creates and
propagates no consensus spans of any kind.
Removed:
- src/xrpld/telemetry/ConsensusReceiveTracing.h (deleted; phase-4
owns it).
- PeerImp.cpp: remove the std::make_shared<SpanGuard>(
proposalReceiveSpan(...))/validationReceiveSpan(...) constructions
in onMessage(TMProposeSet)/onMessage(TMValidation), drop the
sp = std::move(span) lambda captures, and drop the
#include <xrpld/telemetry/ConsensusReceiveTracing.h>.
- RCLConsensus.cpp: drop the two telemetry::injectToProtobuf() blocks
that injected the active trace context into TMProposeSet (in
Adaptor::propose, after addSuppression) and TMValidation (in
Adaptor::validate, around the broadcast call). Drop the now-unused
#include of TraceContextPropagator.h and the
XRPL_ENABLE_TELEMETRY-gated include of
opentelemetry/context/runtime_context.h.
- TraceContextPropagator.h: update file-level @see comment to drop
the ConsensusReceiveTracing.h reference and to scope the
"wired into the P2P message flow via PropagationHelpers.h"
sentence to TMTransaction only.
- PropagationHelpers.h: replace the
@see ConsensusReceiveTracing.h cross-reference with
@see TxTracing.h.
Inert consensus metadata (TraceCategory::Consensus enum value,
seg::consensus constant, isCategoryEnabled/categoryToSpanKind switch
arms, the SpanGuard.h doc-comment example) is intentionally preserved
on phase-3: nothing references it after this commit, but phase-4
needs it and removing it would widen the phase-3 -> phase-4 merge
surface for no benefit.
Verified via git grep: no remaining phase-3 references to
proposalReceiveSpan, validationReceiveSpan, ConsensusReceiveTracing,
consensus_span::, consensus.proposal, or consensus.validation.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Commit c6c019ed8b ("addressed code review comments") bundled tx-tracing
review fixes with consensus-tracing scaffolding that belongs on
pratik/otel-phase4-consensus-tracing (PR #6426). This commit lifts the
consensus-only parts back out of phase-3 so PR #6425 stays scoped to
transaction tracing. Phase-4 already carries the same content (via prior
phase-3 -> phase-4 merges) plus its own evolution on top, so nothing is
moved across — only removed here.
Removed:
- src/xrpld/app/consensus/ConsensusSpanNames.h (deleted; phase-4 owns
it).
- PeerImp.cpp: revert onMessage(TMProposeSet)/onMessage(TMValidation)
consensus-attr setAttribute calls to the hardcoded
"xrpl.consensus.{trusted,round,ledger.seq}" strings used before
c6c019ed8b. Drop the now-unused
#include <xrpld/app/consensus/ConsensusSpanNames.h>.
- RCLConsensus::Adaptor::validate(): remove the
TODO(observability/secure-OTel) block on validation trace_context.
- TraceContextPropagator.h: remove the TODO(observability/secure-OTel)
block on injectToProtobuf().
Tx-tracing parts of c6c019ed8b are intentionally untouched.
No phase-3 caller of telemetry::consensus_span:: remains; verified via
git grep. No test on phase-3 references the removed header.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Document the threat model and chosen hardening approach for the OTel
pipeline: mTLS to the collector as primary defense (across-network
deployment), NetworkPolicy as defense-in-depth, and source-side
validation plus per-peer rate limiting for protocol::TraceContext on
peer messages. Skips Basic Auth (wrong shape for multi-operator
fleet) and HTTP-gateway header stripping (rippled is P2P).
Wires the new doc into the master plan ToC, mermaid diagram, and
body section, plus cross-refs from the privacy section in
02-design-decisions.md and the collector config in
05-configuration-reference.md so readers reach it from natural
in-context entry points. Adds a backlink at the top of secure-OTel.md
to the master plan.
Adds 'exfiltration' and 'htpasswd' to cspell dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop xrpl.node.amendment_blocked / xrpl.node.server_state from telemetry
surface (constants in SpanNames.h, two filters in tempo.yaml). Operators
read the same data via server_info / server_state RPC; OTel SDK 1.18.0
cannot refresh resource attrs at runtime so resource-level emission was
not viable either.
- Namespace all pathfind span attributes under pathfind_* (underscore form
per Phase 1c rule 5). Renames in PathFindSpanNames.h and call sites in
PathRequest.cpp, PathRequestManager.cpp, plus the rule-5 retention
xrpl.pathfind.ledger_index -> pathfind_ledger_index.
- Wire pathfind_source_account / pathfind_dest_account on pathfind.request
in doPathFind / doRipplePathFind handlers (only when present + string).
- Collapse per-asset pathfind.discover / pathfind.rank spans into one
pathfind.discover hoisted around the per-source-asset loop in
PathRequest::findPaths. Span count goes from 2N to 1 per RPC call;
per-asset breakdown traded for bounded storage and cardinality. Trade-off
documented inline.
- Fix pathfind_num_paths semantics: now sums getBestPaths().size() across
the loop (paths actually returned) instead of the maxPaths input cap.
- PathRequestManager::updateAll: move span creation after the locked
requests_ snapshot, early-return when no active subscriptions exist
(avoids empty span on every ledger close), set pathfind_num_requests
= requests.size().
- Update Phase2_taskList.md and 02-design-decisions.md to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per PR #6438 review thread r3250432621: known-command errors
(rpcTOO_BUSY, rpcNO_PERMISSION, etc.) were collapsing into a
single rpc.command.unknown span name, hiding per-command error
rates in dashboards. Same anti-pattern existed for gRPC, where
every method was bucketed under grpc.request with the method
relegated to an attribute.
- RPCHandler.cpp: doCommand error path uses cmdName as the span
suffix; the rpc_span::val::unknownCommand fallback only applies
when the request truly omits both command and method fields.
- GRPCServer.cpp: gRPC span name is now grpc.<MethodName>
(e.g. grpc.GetLedger). Method also retained as an attribute.
- GrpcSpanNames.h: drop the unused op::request constant; update
the span-hierarchy comment.
- RpcSpanNames.h: update the gRPC span diagram to match.
Dashboards on downstream phases will benefit from per-command
breakdowns without needing TraceQL attribute filters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: semgrep-companion-app[bot] <218312740+semgrep-companion-app[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: xrplf-ai-reviewer[bot] <266832837+xrplf-ai-reviewer[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Three interrelated fixes in otel-collector-config.yaml; without them the
Phase 8 log-trace correlation pipeline is silently broken.
1. `resource/logs` processor now upserts `job: xrpld` alongside
`service.name: xrpld`. Loki 3.x OTLP ingestion renames
`service.name` to the label `service_name`, so the runbook /
integration-test queries (`{job="xrpld"} |= "trace_id="`) returned
empty. Upserting the `job` resource attribute at the collector lets
the canonical Loki label flow through unchanged.
2. `filelog` regex makes the `partition:` capture non-capturing-optional.
`Logs::format()` omits the `partition:` prefix when partition is
empty (common for framework-level log lines); the old regex required
it and silently dropped those records.
3. Timestamp parser now matches the real log format. `Logs::format()`
writes microsecond-precision timestamps like
`2026-04-15 10:30:45.123456 UTC`. The layout was
`%Y-%b-%d %H:%M:%S` — missing fractional seconds and timezone —
which failed strptime and dropped timestamps. New layout is
`%Y-%b-%d %H:%M:%S.%f` with `location: UTC`.
Also adds a block-comment documenting the real log format so the
next person to touch this doesn't re-introduce the same gaps.
- `src/test/telemetry/MetricsRegistry_test.cpp` (Beast `unit_test::suite`
format under `src/test/`) duplicates the GTest version already
maintained at `src/tests/libxrpl/telemetry/MetricsRegistry.cpp`.
Project rule (`tasks/lessons.md` §Test Format): all new tests use
GTest under `src/tests/libxrpl/`. The GTest version exercises the
same four cases (disabled construction, start/stop lifecycle, recording
no-op, destructor-calls-stop). Deleting the Beast duplicate eliminates
drift and keeps the test authoritative in one place.
- Drop the matching `test.telemetry > xrpl.basics/xrpl.core/xrpld.telemetry`
entries from `.github/scripts/levelization/results/ordering.txt`
because `xrpl.test.telemetry` (the GTest binary) retains its own
entries; the removed ones belonged to the deleted Beast suite.
- `.claude/instructions.md` was committed as a symlink to an
author-local absolute path (`/home/pratik/sourceCode/personal/Rippled/
instructions.md`) that does not exist for any other contributor or in
CI. Remove the symlink from git tracking and add `.claude/` to
`.gitignore` so future agent commits do not re-add per-developer
settings.
MetricsRegistry observable-gauge callbacks run on the OTel reader thread
and read live state from nodeStore_, overlay_, networkOPs_, ledgerMaster,
inboundLedgers, loadManager, and others. The old shutdown sequence called
metricsRegistry_->stop() AFTER all those services were already stopped,
which left a race window between each service's stop() and the final
provider_->ForceFlush() during which a callback could dereference
already-stopped service state. The try/catch guards in each callback
mitigated crashes but not reads from freed members.
- Add MetricsRegistry::detachCallbacks() that sets an atomic<bool>
callbacksDetached_ with release ordering. Idempotent.
- Guard every ObservableGauge callback entry with an acquire-load of the
same flag and return early if it is set. Covers all 15 registered
callbacks (cacheHitRate, txq, objectCount, loadFactor, nodeStore,
serverInfo, buildInfo, completeLedgers, dbMetrics, validatorHealth,
peerQuality, ledgerEconomy, stateTracking, storageDetail,
validationAgreement).
- Application::run() shutdown sequence now calls
metricsRegistry_->detachCallbacks() right after m_loadManager->stop()
and BEFORE m_shaMapStore, m_jobQueue, overlay_, grpcServer_,
m_networkOPs, serverHandler_, m_ledgerReplayer, m_inboundTransactions,
m_inboundLedgers, ledgerCleaner_, m_nodeStore, perfLog_ are stopped.
The acquire/release pair guarantees subsequent reader-thread ticks see
the detach before they dereference stopped services.
- metricsRegistry_->stop() keeps setting the flag as a belt-and-suspenders
defense in case a future caller forgets to detach first.
- Drop the misleading "No explicit RemoveCallback is needed" comment
from stop(); provider destruction alone does not beat the reader
thread to already-freed state.
The objectCountGauge callback previously discarded its state pointer
via `void* /* state */`; restore the state argument so it can access
self->callbacksDetached_ too.
File renames to match the post-docs.sh project-wide rename + the UID
rename applied in the previous commit. Five phase-9 dashboards are
affected:
- rippled-fee-market.json -> xrpld-fee-market.json
- rippled-job-queue.json -> xrpld-job-queue.json
- rippled-peer-quality.json -> xrpld-peer-quality.json
- rippled-rpc-perf.json -> xrpld-rpc-perf-otel.json
- rippled-validator-health.json-> xrpld-validator-health.json
`rippled-rpc-perf.json` is renamed to `xrpld-rpc-perf-otel.json` (rather
than `xrpld-rpc-perf.json`) to avoid colliding with the
phase-6 `rpc-performance.json` dashboard which also uses the
`xrpld-rpc-perf` UID. The new filename matches its now-unique
`xrpld-rpc-perf-otel` UID that was set in the merge commit.
Phase 4 added a span catalog in `06-implementation-phases.md` listing the
source location for each consensus span. Line numbers `Consensus.h:707`,
`RCLConsensus.cpp:232/341/492/541/900` drift on every refactor and would
become stale PR after PR. Filename alone is enough for operators to
grep — the RCLConsensus.cpp spans are already unambiguous from the span
name itself.
Follow-up to the phase-6 dashboard cleanup. The three dashboards
introduced by commit f6105ece98 (consensus-health, rpc-performance,
transaction-overview) were missed in the initial UID rename and still
carried `rippled-*` UIDs plus line-number refs in panel descriptions.
- UIDs: `rippled-consensus` -> `xrpld-consensus`,
`rippled-rpc-perf` -> `xrpld-rpc-perf`,
`rippled-transactions` -> `xrpld-transactions`, matching the
post-`docs.sh`-rename runbook and the other dashboards in this PR.
- Strip `:<line>` suffixes from `ServerHandler.cpp`, `RCLConsensus.cpp`,
`NetworkOPs.cpp`, etc. references in panel descriptions. Line numbers
drift on every refactor; the filename is enough to grep.
- Fix the Overall RPC Throughput panel: two targets filtered on
`span_name="rpc.request"` (never emitted) instead of
`span_name="rpc.http_request"` (the real emitted name). The panel
would have shown zero data until this fix.
Follow-up to the dashboard cleanup on this branch. Caught additional sites
in TESTING.md that still reference the never-emitted `rpc.request` span:
- TraceQL query examples in Step 5 "Verify traces in Tempo" now filter on
`name="rpc.http_request"` (the real emitted name).
- Expected-spans table replaces `rpc.request` with `rpc.http_request`.
- Query loop under the Prometheus verification section now iterates over
the full set of emitted RPC entry-point names
(`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`).
Also drop `exporter=otlp_http` from the sample telemetry config block.
`TelemetryConfig.cpp` does not parse an `exporter` key in any phase through
Phase 8; only OTLP/HTTP is wired up, so the line is either a silently
ignored no-op or misleading documentation.
Phase-6 introduces ledger-operations, peer-network, and the five StatsD
dashboards. Align them with the rest of the chain:
- Rename dashboard UIDs from `rippled-*` to `xrpld-*` so the provisioned
UIDs match the post-rename-script documentation (`docs.sh` rewrites
.md but not .json, so the two drifted). Runbook references
`xrpld-rpc-perf`, `xrpld-transactions`, etc., now the JSON matches.
- Add the `$node` template variable + `exported_instance=~"$node"` filter
to every target in the five `statsd-*` dashboards. Mirrors the pattern
already used by consensus-health, ledger-operations, and peer-network
per the project rule that every dashboard must support per-node
filtering.
- Strip `:<line>` (and `:NN-NN` range) suffixes from C++ file references
in every dashboard panel description and in docker/telemetry/TESTING.md.
Line numbers drift on every refactor; the filename alone is enough to
grep.
- Replace stale `rpc.request` entries with the real emitted span names
(`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`)
in TESTING.md so operators can copy-paste the filters and hit real
traces.
- Also drop the `:706` line ref from the `StatsDCollector.cpp` callout
in `06-implementation-phases.md`.
- RPC Spans table: `rpc.request` was documented but the code actually emits
`rpc.http_request`. Listed the actual emitted names
(`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`)
and their parent/child relationship.
- Drop `:<line>` suffixes from Source File columns in both RPC and
Transaction span tables. Line numbers drift with every refactor; the
filename is enough for operators to grep.
- Summary table: replace the never-emitted `rpc.request` row with the real
entry points so `span_name=` filters in PromQL / TraceQL match.
Phase-1a plan documents advertised OTLP/gRPC on port 4317 as the default
exporter, four unparsed [telemetry] config keys, and "Phase 4a Complete"
status with exit-criteria checkboxes marked done. Every downstream branch
through Phase 5 ships only OTLP/HTTP on port 4318 via OtlpHttpExporterFactory,
never parses the advertised keys, and the Phase 4 work is not yet delivered.
Fixes:
- 02-design-decisions.md: flip §2.1.1 SDK dependency recommendations to
OTLP/HTTP (shipped) with OTLP/gRPC marked Future. Update §2.2 architecture
diagram and text from OTLP/gRPC:4317 to OTLP/HTTP:4318. Rewrite §2.2.1 as
"OTLP/HTTP (Shipped)" and §2.2.2 as "OTLP/gRPC (Future Work — Planned
Upgrade)" with a concrete checklist (Conan dep, config parsing, factory
branch, runbook/dashboard updates) for landing the gRPC transport later.
- 05-configuration-reference.md: drop the fabricated exporter/otlp_grpc key
and the :4317 default from the sample config block and the options-summary
table. Move trace_pathfind, trace_txq, trace_validator, trace_amendment
into a new "Planned (not yet implemented)" table citing the phase that will
add each one. Keep the example config minimal so copy-paste does not produce
a silently-ignored stanza.
- 06-implementation-phases.md: reset Phase 4 Exit Criteria checkboxes from
[x] to [ ] (Phase 4 is not shipped at Phase-1a time). Rename "Phase 4a
Complete" to "Phase 4a Plan" and describe the work as future. Replace the
broken forward link to Phase4_taskList.md (introduced in the Phase 2 PR)
with a sentence pointing readers to where that spec will land. Renumber
the final section 6.12 to 6.11 so it sits directly after 6.10; section 6.11
("Effort Summary") was intentionally removed in earlier edits.
SpanGuard::span() hardcoded SpanKind::kInternal for every span. Tempo's
service-graph and spanmetrics RED calculations rely on kServer /
kConsumer / kClient / kProducer to classify inbound vs outbound vs
internal operations. With kInternal everywhere, the service graph
collapses to a single self-loop and RED metrics attribute all latency
to internal work.
Add categoryToSpanKind() mapping:
- Rpc -> kServer (inbound synchronous request)
- Peer -> kConsumer (inbound async peer message)
- Transactions -> kInternal
- Consensus -> kInternal
- Ledger -> kInternal
Only the single-argument overload is affected; childSpan / linkedSpan
continue to default to kInternal because they represent in-process
continuations of an already-kinded parent.
Spanmetrics dimensions used xrpl.rpc.command etc. but C++ emits bare
"command". Tempo tags for phase6-added consensus/tx/peer filters used
qualified names but C++ uses bare names. Dashboard panel referenced
xrpl_tx_suppressed (never populated) instead of suppressed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consensus span attributes use bare names (close_time_correct,
consensus_state, close_resolution_ms) and shared canonical attrs
(xrpl.ledger.seq) per SpanNames.h. xrpl.consensus.mode and
xrpl.consensus.round are correct (domain-qualified to avoid collision).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Transaction span attributes use bare names (local, tx_status) per
SpanNames.h convention, not xrpl.tx.* qualified names. xrpl.tx.hash
is correct (shared canonical attr defined in SpanNames.h).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RPC span attributes use bare names (command, rpc_status, rpc_role) per
the naming convention in SpanNames.h, not xrpl.rpc.* qualified names.
Node health attributes (amendment_blocked, server_state) are resource
attributes set at Tracer init, not span attributes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Panels 8-15 from statsd-node-health.json and panels 8-9 from
statsd-network-traffic.json were lost when Phase 7 renamed these files
to system-*. The merge (5cd71ed107) took Phase 7's smaller version
without the extra panels added by commit b933e8ae00 on Phase 6.
Recovered panels (system-node-health.json):
- Key Jobs Execution Time (11 job types)
- Key Jobs Dequeue Wait Time (11 job types)
- FullBelowCache Size
- FullBelowCache Hit Rate
- Ledger Publish Gap (validated - published age delta)
- State Duration Rate (Full vs Tracking)
- All Jobs Execution Time Detail (34 job types)
- All Jobs Dequeue Wait Detail (34 job types)
Recovered panels (system-network-traffic.json):
- Duplicate Traffic (Wasted Bandwidth)
- All Traffic Categories Detail (topk 15 by byte rate)
All recovered panels updated to include exported_instance=~"$node"
filter per project dashboard guidelines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MetricsRegistry.cpp: concatenate nested namespaces, add missing
direct includes (Journal.h, string, string_view, cstdint), suppress
readability-convert-member-functions-to-static in #else stubs by
referencing enabled_ member, void unused instanceId parameter.
- MetricsRegistry test: add missing direct includes (Log.h, Journal.h,
uint256.h, io_context.hpp, optional, stdexcept, string), make
throwUnimplemented() static, add [[nodiscard]] to getOpenLedger/
isStopping/getTrapTxID overrides, make const-eligible registry const.
- PerfLogImp.cpp: add braces around if/else body per
readability-braces-around-statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MetricsRegistry.cpp: concatenate nested namespaces, add missing
direct includes (Journal.h, string, string_view, cstdint), suppress
readability-convert-member-functions-to-static in #else stubs by
referencing enabled_ member, void unused instanceId parameter.
- MetricsRegistry test: add missing direct includes (Log.h, Journal.h,
uint256.h, io_context.hpp, optional, stdexcept, string), make
throwUnimplemented() static, add [[nodiscard]] to getOpenLedger/
isStopping/getTrapTxID overrides, make const-eligible registry const.
- PerfLogImp.cpp: add braces around if/else body per
readability-braces-around-statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- otel-collector-config.yaml: spanmetrics dimensions use new bare names.
- tempo.yaml: TraceQL filter tags use new bare names.
- 02-design-decisions.md: strip xrpl.txq.* prefix from planned attrs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Phase2_taskList: update attr refs to bare names, note node-health
attrs moved to resource level.
- 02-design-decisions: strip xrpl.pathfind.* prefix from planned attrs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update OpenTelemetryPlan docs and Telemetry.h doc example to reflect
the renamed per-span attributes: xrpl.rpc.command -> command,
xrpl.rpc.status -> rpc_status, xrpl.grpc.method -> method, etc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update 31 attribute references in telemetry-runbook.md to match the
simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use
domain-qualified names for collisions (rpc_status, consensus_state,
etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
dest_account, fast, search_level, num_complete_paths, num_paths,
num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 10's workload validation configs (expected_metrics.json,
regression-metrics.json, validate_telemetry.py) queried the
MetricsRegistry metrics under the rippled_ prefix, but MetricsRegistry
emits them as xrpld_ (see MetricsRegistry.cpp). On a live run the
workload validator reported every MetricsRegistry metric as missing,
masking genuine regressions.
Rename the following to xrpld_ across the workload validator,
expected-metrics manifest, and regression-metrics template:
- nodestore_state, cache_metrics, txq_metrics, load_factor_metrics,
object_count
- rpc_method_started_total / _finished_total / _errored_total /
_duration_us
- job_queued_total / _started_total / _finished_total /
_queued_duration_us_bucket / _running_duration_us_bucket
- peer_quality, server_info, validator_health, ledger_economy,
db_metrics, complete_ledgers, build_info, state_tracking,
storage_detail
- ledgers_closed_total, validations_sent_total,
validations_checked_total, state_changes_total
- validation_agreement, validation_agreements_total,
validation_missed_total
Mirrors the phase-9 fix in commit 5601615952.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MetricsRegistry emits OTel SDK metrics with the xrpld_ prefix
(MetricsRegistry.cpp defines "xrpld_nodestore_state",
"xrpld_cache_metrics", etc.), but the Phase 9 dashboards and the
Step 10c integration-test assertions introduced in 892fee638a
queried the rippled_ prefix. Every Phase 9 panel and assertion
therefore rendered "No data" or failed on a live run, even though
the underlying series were being exported correctly.
Rename the rippled_ prefix to xrpld_ for every MetricsRegistry
metric in dashboards and the integration test:
- nodestore_state, cache_metrics, txq_metrics, load_factor_metrics,
object_count
- rpc_method_started_total / _finished_total / _errored_total /
_duration_us_bucket
- job_queued_total / _started_total / _finished_total /
_queued_duration_us_bucket / _running_duration_us_bucket
- peer_quality, server_info, validator_health, ledger_economy,
db_metrics, complete_ledgers, build_info, state_tracking
- ledgers_closed_total, validations_sent_total,
validations_checked_total, state_changes_total
- validation_agreement (ValidationTracker 1h/24h/7d windows)
Also add ValidationTracker window-gauge assertions to Step 10c of
integration-test.sh so the 1h/24h/7d agreement and miss counts are
checked alongside the other Phase 9 gauges.
The rippled_ prefix is preserved for beast::insight metrics
(rippled_LedgerMaster_*, rippled_Peer_Finder_*, rippled_total_*,
rippled_Overlay_*, rippled_State_Accounting_*, rippled_transactions_*,
rippled_proposals_*, rippled_validations_Messages_*) because those
flow through the StatsD-style OTelCollector configured with
`[insight] prefix=rippled` and remain on that prefix by design.
Verified against a live 6-node consensus network: all 22 Phase 9 +
ValidationTracker assertions now report 6+ series per metric.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two fixes so gauges register in Prometheus (via StatsD) even when their
initial/steady-state value is 0:
1. StatsDGaugeImpl m_dirty: default-init to true so the initial value
(0) is emitted on the first flush. Previously, gauges whose value
never changed from 0 were never flushed and never appeared
downstream.
2. io_latency_sampler firstSample_: new atomic<bool>, init true.
m_event.notify now fires when either firstSample_ is true (exchanged
to false) or lastSample >= 10 ms. This guarantees the io_latency
metric is registered on startup; subsequent sub-10 ms samples are
still suppressed to avoid flooding.
Set defaults for tx_span::attr::suppressed (false) and
tx_span::attr::status ("new") immediately after creating the txReceive
span. Without defaults, spans whose suppressed/status attributes would
only be set in the HashRouter-suppressed branch lacked these attributes
entirely, producing incomplete span data in downstream stores.
The suppressed branch still overrides these when the transaction has
already been seen via HashRouter.
Two build failures surfaced by CI on the Phase 9 branch:
1. NetworkOPsImp stores the ServiceRegistry as
std::reference_wrapper<ServiceRegistry> registry_, so calls must go
through registry_.get().<method>(). The MetricsRegistry hooks added
in setMode() and recvValidation() dereferenced the wrapper directly,
which compiles against a pre-existing accessor on the wrapper type
on some toolchains but fails on clang 16/17/20 and gcc 13/15 with
"no member named 'getMetricsRegistry' in
std::reference_wrapper<xrpl::ServiceRegistry>".
2. MetricsRegistry::app_ and MetricsRegistry::journal_ are only used
inside XRPL_ENABLE_TELEMETRY-guarded code paths (gauge callbacks
and JLOG). When telemetry is disabled, clang's
-Werror=-Wunused-private-field tripped. Move the two fields under
the same #ifdef and guard the constructor initialisers with
[[maybe_unused]] so the no-op build continues to compile cleanly.
Addresses code review findings on PR #6513:
1. registerAsyncGauges() was ~730 lines, violating the CLAUDE.md
rule "No function longer than 80 lines." Split into fifteen
per-domain helpers (cache, TxQ, object count, load factor,
NodeStore, server info, build info, complete ledgers, DB,
validator health, peer quality, ledger economy, state tracking,
storage detail, validation agreement) dispatched from a thin
shell. Each helper now stays at or below the 80-line limit.
2. PerfLogImp::rpcEnd() only updated the in-memory counter and
never advanced the OTel xrpld_rpc_method_finished_total,
xrpld_rpc_method_errored_total, or xrpld_rpc_method_duration_us
instruments. rpcStart() was already wired up, so the finished
and errored counters stayed at zero for every RPC call.
rpcEnd() now computes the duration once, records it under the
existing mutex, and forwards finish/error events to
MetricsRegistry::recordRpcFinished / recordRpcErrored outside
the counter mutex to avoid lock nesting with the OTel SDK.
3. Added class-level Doxygen for MetricsRegistry with an ASCII
collaborator diagram and explicit @note tags covering
thread-safety, lifetime, and extension guidance.
Tempo /api/traces/{id} returns OTLP-shaped JSON with a top-level
"batches" key, not "data". The cross-check in check_log_correlation
was querying jq '.data | length' which always returned null, causing
the Log-Tempo cross-check to fail even when the trace existed.
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".
Rename all 13 mismatched overrides to match the current interface:
timeKeeper -> getTimeKeeper
cachedSLEs -> getCachedSLEs
validators -> getValidators
validatorSites -> getValidatorSites
validatorManifests -> getValidatorManifests
publisherManifests -> getPublisherManifests
overlay -> getOverlay
cluster -> getCluster
peerReservations -> getPeerReservations
pendingSaves -> getPendingSaves
openLedger (x2) -> getOpenLedger
getPathRequests -> getPathRequestManager (type rename too)
journal -> getJournal
logs -> getLogs
trapTxID -> getTrapTxID
app -> getApp
Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Consensus.h (Phase 4 tracing) depends on DisputedTx::getYays()/getNays()
to build disputeResolve span events. Both accessors were removed by
earlier 'duplicate accessor' cleanup commits on this branch, leaving
Consensus.h referencing non-existent members. CI caught this on
macOS/clang-17/gcc-13/Windows builds.
Restore the accessors on the branch where they were dropped so downstream
phase branches inherit a compiling DisputedTx.h via merge.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Clang-tidy fixes:
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
in OTelCollector.h, OTelCollector.cpp, ValidationTracker.h/.cpp
- Add missing direct includes (misc-include-cleaner) in
ValidationTracker.cpp, test, CollectorManager.cpp, OTelCollector.cpp
- Make lock_guard variables const (misc-const-correctness)
- Add braces around single-line if/else (readability-braces-around-statements)
- Use designated initializer for WindowEvent (modernize-use-designated-initializers)
- Initialize LedgerEvent::seq field (cppcoreguidelines-pro-type-member-init)
Linker fix:
- Add ValidationTracker.cpp as source to xrpl.test.telemetry target
(it lives in src/xrpld/ but the test links against libxrpl only)
Levelization fix:
- Remove stale dependency edges from ordering.txt that were introduced
by the erroneous develop-merge commit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These were already added in an earlier phase branch. The duplicate
with slightly different Doxygen wording was introduced by the
erroneous merge/revert cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace GetSpan() with direct context value check in Logs::format()
to avoid heap allocation (new DefaultSpan) on the no-span path
- Restore Phase 7 documentation accidentally deleted during merge
- Fix undefined $JAEGER variable → use $TEMPO in integration test
- Remove useless LCOV_EXCL markers around #ifdef block
- Fix indentation inconsistencies in Log.cpp injection block
- Remove incorrect url field from loki.yaml derivedFields
- Update stale code sample in Phase8_taskList.md to match implementation
- Correct "<10ns" performance claims to accurate ~15-20ns (no-span)
and ~50ns (active-span) measurements across all docs
- Replace Jaeger references with Tempo in TESTING.md (port 16686→3200)
- Improve error handling in check_log_correlation(): track files_scanned,
detect missing log files, fix silent grep error masking
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix use-after-free: extract gauge callback to static function and call
RemoveCallback in ~OTelGaugeImpl() before unregistering from collector
- Use memory_order_acq_rel on callHooks() debounce CAS for proper
happens-before relationship between hook invocations
- Add explicit 2s timeout to ForceFlush() in destructor to prevent
blocking indefinitely when OTLP endpoint is unreachable at shutdown
- Add OTLP receiver to metrics pipeline so native OTel metrics from
xrpld are actually received by the collector
- Remove stale health check port from docker-compose (extension was
removed from collector config)
- Clarify fallback docs: StatsD path requires re-enabling receiver/port
- Fix comments: Counter uses uint64_t not int64_t, gauge clamps to
[0, INT64_MAX] not [0, UINT64_MAX]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The blanket revert of f4555c80fe also un-reverted some files that had
been correctly matched to phase-6 (nodestore Backend API refactor,
Vault_test changes). Restore those to the base branch state so the
phase-7 PR only contains telemetry changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture
Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp
Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
09-data-collection-reference.md and telemetry-runbook.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- protocol/README.md: restore historical GitHub URL path (src/ripple/)
- Config.cpp: restore configLegacyName as "rippled.cfg" (legacy name
must remain as-is for backward compatibility)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts 259 files that carried unrelated upstream changes through the
phase-6 merge: enum class removals (cppcoreguidelines-use-enum-class),
scoped_lock→lock_guard conversions (modernize-use-scoped-lock),
nodestore Backend API changes (void const* key), .clang-tidy config,
test infrastructure deletions, and miscellaneous develop changes.
These changes belong on develop, not in the telemetry PR chain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert reusable-build-test-config.yml to develop (action SHA update
and "Show test failure summary" step removal don't belong here)
- Revert upload-conan-deps.yml to develop (action SHA update)
- Revert features.macro: BatchInnerSigs and Batch back to Supported::no
(these feature flag changes are unrelated to telemetry)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separate local declarations from assignments to avoid hiding errors,
and use [[ instead of [ for non-POSIX comparisons.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[nodiscard]] to getConsensusTraceStrategy, getYays, getNays
- Add missing <string>, SpanGuard.h, SpanNames.h includes
- Fix widening cast placement (cast before arithmetic, not after)
- Replace nested ternary with lambda for const dir variable
- Add braces to if/else-if chains in Consensus.h
- Concatenate nested namespaces in ConsensusSpanNames.h
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Concatenate nested namespaces in SpanNames.h, RpcSpanNames.h, GrpcSpanNames.h
- Remove unused InfoSub.h and NetworkOPs.h includes from RPCHandler.cpp
- Add missing <string_view> includes in RPCHandler.cpp and GRPCServer.cpp
- Replace nested ternary with if/else-if in RPCHandler.cpp
- Add IWYU pragma keep for json_body.h in ServerHandler.cpp
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clang-tidy misc-include-cleaner requires direct includes for all used
symbols. Application.cpp calls toBase58(TokenType::NodePublic, ...) at
line 1359 but did not directly include PublicKey.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add [[maybe_unused]] to RAII spans in TxQ.cpp
- Include Telemetry.h in RCLConsensus.cpp for complete type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add cons_span::event namespace with disputeResolve and txIncluded
constants; replace hardcoded strings in Consensus.h and RCLConsensus.cpp
- Move proposal.receive and validation.receive spans in PeerImp into
shared_ptr captured by job lambdas so they measure checkPropose and
checkValidation timing, not just message parsing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Phase4_taskList.md and 06-implementation-phases.md to reflect
completed implementation of all remaining Phase 4/4a tasks (4.2-4.6,
4a.5, 4a.6, 4a.8). Update exit criteria and summary tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining Phase 4/4a consensus tracing tasks:
- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()
Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 4-arg hashSpan overload was duplicated during a prior rebase
cascade — it appeared at both line 240 and line 305 in SpanGuard.cpp.
This would cause a linker error (multiple definition).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record the close time voting threshold and consensus state on
consensus.update_positions and consensus.check spans:
- xrpl.consensus.close_time_threshold: the avCT_CONSENSUS_PCT (75%)
threshold required for close time agreement
- xrpl.consensus.have_close_time_consensus: whether validators
reached close time consensus in this iteration
These attributes enable dashboards to show how the close time
voting process converges (or stalls) across consensus iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the span-replacement logic in startRoundTracing() that was
discarding the hash-derived round span and replacing it with a linked
span (which gets a random trace_id). The deterministic trace_id from
the ledger hash is the key feature enabling cross-node correlation —
replacing it broke correlation on all rounds after the first.
Also: use thread_local mt19937 for hashSpan() span IDs (same fix as
phase-3 txSpan), add Doxygen to establish tracing method declarations
in Consensus.h, and update SpanGuard.h diagram with hashSpan/addEvent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.
- Add TraceBytes struct and SpanGuard::getTraceBytes() for
extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
with sender's trace context as parent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tx.receive span in PeerImp: convert to shared_ptr, capture in
checkTransaction lambda so it measures actual processing, not just
message parsing
- tx.process span in NetworkOPs: convert to shared_ptr, store in
TransactionStatus so it lives until the batch job processes the entry;
sync path unchanged (span destructs on function return)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace SpanGuard::txSpan(prefix, name, hash) with the generic
SpanGuard::hashSpan(TraceCategory, name, hash) that accepts a
TraceCategory parameter instead of hardcoding Transactions. This
enables reuse for consensus round spans (Phase 4) and any future
subsystem needing deterministic cross-node trace correlation via
hash-derived trace IDs.
Both overloads are replaced:
- hashSpan(cat, name, hash, size) — standalone with random span_id
- hashSpan(cat, name, hash, size, parentSpanId, parentSize, flags)
— with remote parent from protobuf context propagation
Add full span name constants (tx_span::receive, tx_span::process)
to TxSpanNames.h following the ConsensusSpanNames.h pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark local variables in extractFromProtobuf() and injectToProtobuf()
as const since they are not modified after initialization: traceId,
spanId, flags, spanCtx, and span.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace thread_local mt19937 with xrpl::default_prng() for span ID
generation — uses the project's existing thread-local xor-shift engine.
One call yields a uint64_t (8 bytes), filling the span ID in a single
memcpy without loops.
Fix compilation failure when XRPL_ENABLE_TELEMETRY is not defined:
move xrpl.pb.h include outside the #ifdef guard in TxTracing.h since
protocol::TMTransaction is used unconditionally in the function
signature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-call std::random_device with thread_local std::mt19937 in
txSpan() for span ID generation. random_device is ~423x slower due to
/dev/urandom syscalls on each construction; mt19937 is seeded once per
thread and reused for all subsequent span IDs.
Update the SpanGuard class ASCII diagram to include txSpan factory
methods that were added in the hash-derived trace ID commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move TxSpanNames.h and TxQSpanNames.h from src/xrpld/telemetry/ to sit
next to the classes they instrument, matching the PathFindSpanNames.h
convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive trace_id from txHash[0:16] so all nodes handling the same
transaction produce spans under the same trace. Protobuf span_id
propagation provides parent-child relay ordering when available.
- Add SpanGuard::txSpan() factory methods (hash-derived trace ID)
- Add TxTracing.h helpers: txReceiveSpan(), txProcessSpan()
- Update PeerImp and NetworkOPs to use the new helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trace_id = txHash[0:16] strategy so all nodes handling the same
transaction independently produce spans under the same trace_id,
combined with protobuf span_id propagation for parent-child ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move scattered string literals from PeerImp.cpp and NetworkOPs.cpp into
compile-time constants in src/xrpld/telemetry/TxSpanNames.h. Follows
the same StaticStr/join() pattern established in Phase 1c for RPC spans.
Constants cover: span prefixes (tx), operations (receive, process),
attribute keys (hash, local, path, suppressed, status, peerId,
peerVersion), and values (sync, async, knownBad).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to old XRPL_TRACE_TX/CONSENSUS macros with
SpanGuard::span(TraceCategory, ...) factory calls introduced in Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add "hicpp" to cspell dictionary for NOLINT annotations
- Concatenate nested namespaces in RpcSpanNames.h
- Fix include hygiene and nested ternary in RPCHandler.cpp
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add cons_span::event namespace with disputeResolve and txIncluded
constants; replace hardcoded strings in Consensus.h and RCLConsensus.cpp
- Move proposal.receive and validation.receive spans in PeerImp into
shared_ptr captured by job lambdas so they measure checkPropose and
checkValidation timing, not just message parsing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Phase4_taskList.md and 06-implementation-phases.md to reflect
completed implementation of all remaining Phase 4/4a tasks (4.2-4.6,
4a.5, 4a.6, 4a.8). Update exit criteria and summary tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining Phase 4/4a consensus tracing tasks:
- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()
Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 4-arg hashSpan overload was duplicated during a prior rebase
cascade — it appeared at both line 240 and line 305 in SpanGuard.cpp.
This would cause a linker error (multiple definition).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record the close time voting threshold and consensus state on
consensus.update_positions and consensus.check spans:
- xrpl.consensus.close_time_threshold: the avCT_CONSENSUS_PCT (75%)
threshold required for close time agreement
- xrpl.consensus.have_close_time_consensus: whether validators
reached close time consensus in this iteration
These attributes enable dashboards to show how the close time
voting process converges (or stalls) across consensus iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the span-replacement logic in startRoundTracing() that was
discarding the hash-derived round span and replacing it with a linked
span (which gets a random trace_id). The deterministic trace_id from
the ledger hash is the key feature enabling cross-node correlation —
replacing it broke correlation on all rounds after the first.
Also: use thread_local mt19937 for hashSpan() span IDs (same fix as
phase-3 txSpan), add Doxygen to establish tracing method declarations
in Consensus.h, and update SpanGuard.h diagram with hashSpan/addEvent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.
- Add TraceBytes struct and SpanGuard::getTraceBytes() for
extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
with sender's trace context as parent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tx.receive span in PeerImp: convert to shared_ptr, capture in
checkTransaction lambda so it measures actual processing, not just
message parsing
- tx.process span in NetworkOPs: convert to shared_ptr, store in
TransactionStatus so it lives until the batch job processes the entry;
sync path unchanged (span destructs on function return)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace SpanGuard::txSpan(prefix, name, hash) with the generic
SpanGuard::hashSpan(TraceCategory, name, hash) that accepts a
TraceCategory parameter instead of hardcoding Transactions. This
enables reuse for consensus round spans (Phase 4) and any future
subsystem needing deterministic cross-node trace correlation via
hash-derived trace IDs.
Both overloads are replaced:
- hashSpan(cat, name, hash, size) — standalone with random span_id
- hashSpan(cat, name, hash, size, parentSpanId, parentSize, flags)
— with remote parent from protobuf context propagation
Add full span name constants (tx_span::receive, tx_span::process)
to TxSpanNames.h following the ConsensusSpanNames.h pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark local variables in extractFromProtobuf() and injectToProtobuf()
as const since they are not modified after initialization: traceId,
spanId, flags, spanCtx, and span.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace thread_local mt19937 with xrpl::default_prng() for span ID
generation — uses the project's existing thread-local xor-shift engine.
One call yields a uint64_t (8 bytes), filling the span ID in a single
memcpy without loops.
Fix compilation failure when XRPL_ENABLE_TELEMETRY is not defined:
move xrpl.pb.h include outside the #ifdef guard in TxTracing.h since
protocol::TMTransaction is used unconditionally in the function
signature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-call std::random_device with thread_local std::mt19937 in
txSpan() for span ID generation. random_device is ~423x slower due to
/dev/urandom syscalls on each construction; mt19937 is seeded once per
thread and reused for all subsequent span IDs.
Update the SpanGuard class ASCII diagram to include txSpan factory
methods that were added in the hash-derived trace ID commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move TxSpanNames.h and TxQSpanNames.h from src/xrpld/telemetry/ to sit
next to the classes they instrument, matching the PathFindSpanNames.h
convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive trace_id from txHash[0:16] so all nodes handling the same
transaction produce spans under the same trace. Protobuf span_id
propagation provides parent-child relay ordering when available.
- Add SpanGuard::txSpan() factory methods (hash-derived trace ID)
- Add TxTracing.h helpers: txReceiveSpan(), txProcessSpan()
- Update PeerImp and NetworkOPs to use the new helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trace_id = txHash[0:16] strategy so all nodes handling the same
transaction independently produce spans under the same trace_id,
combined with protobuf span_id propagation for parent-child ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move scattered string literals from PeerImp.cpp and NetworkOPs.cpp into
compile-time constants in src/xrpld/telemetry/TxSpanNames.h. Follows
the same StaticStr/join() pattern established in Phase 1c for RPC spans.
Constants cover: span prefixes (tx), operations (receive, process),
attribute keys (hash, local, path, suppressed, status, peerId,
peerVersion), and values (sync, async, knownBad).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to old XRPL_TRACE_TX/CONSENSUS macros with
SpanGuard::span(TraceCategory, ...) factory calls introduced in Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[maybe_unused]] to RAII span variables in PathFind/RipplePathFind
handlers (Clang -Wunused-variable with -Werror)
- Restore over-renamed values: rippledb, rippled.cfg, historical GitHub URL
- Concatenate nested namespaces in SpanNames.h and PathFindSpanNames.h
(modernize-concat-nested-namespaces)
- Add missing includes and const qualifiers in test files
- Suppress intentional use-after-move in SpanGuardFactory move test
- Remove unused NetworkOPs.h include from PathRequest.cpp
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.
- Add TraceBytes struct and SpanGuard::getTraceBytes() for
extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
with sender's trace context as parent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The StatsD receiver config was lost during a branch rebase (--ours
conflict resolution dropped it). Re-add the statsd receiver to the
OTel Collector config and wire it into the metrics pipeline so
beast::insight UDP metrics flow to Prometheus.
Also fixes:
- Metric prefix mismatch: docs used xrpld_ but dashboards/tests use
rippled_ — align all documentation to match the runnable stack
- Remove phantom Peer_Disconnects_Charges from docs (plain atomic,
not a beast::insight gauge)
- Remove premature .codecov.yml exclusions for Phase 7 OTelCollector
files that don't exist on this branch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing xrpl.consensus.quorum attribute to consensus.accept in runbook
- Fix dashboard legend formats: add exported_instance, use Title Case
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 14 missing spans to runbook (6 TxQ + 8 consensus)
- Fix tx.receive attributes and config table in runbook
- Document dispute.resolve and tx.included span events
- Add spanmetrics dimensions for close_time_correct and tx.suppressed
- Fix Close Time Agreement and TX Receive vs Suppressed panel PromQL
- Wire $consensus_mode template variable to all consensus panels
- Add 10 Tempo search filters for operational attributes
- Apply rename script artifacts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document service_instance_id, use_tls, tls_ca_cert, batch_size,
batch_delay_ms, and max_queue_size in the [telemetry] section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace remaining rippled/Ripple references with xrpld/XRPL in
data collection reference, implementation phases, and runbook docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve merge conflicts taking phase 4 consensus span improvements,
fix bashate indentation in integration test script, and apply rename
script to Phase5 integration test docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unknownCommand and wsUpgrade span name constants to RpcSpanNames.h,
fix SpanGuardFactory tests to use the 3-argument SpanGuard::span() API,
update levelization results, and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use C++17 concatenated namespaces, add [[nodiscard]] to query methods,
add missing direct includes, and use pass-by-value + std::move in
NullTelemetry constructor.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tx.receive span in PeerImp: convert to shared_ptr, capture in
checkTransaction lambda so it measures actual processing, not just
message parsing
- tx.process span in NetworkOPs: convert to shared_ptr, store in
TransactionStatus so it lives until the batch job processes the entry;
sync path unchanged (span destructs on function return)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add cons_span::event namespace with disputeResolve and txIncluded
constants; replace hardcoded strings in Consensus.h and RCLConsensus.cpp
- Move proposal.receive and validation.receive spans in PeerImp into
shared_ptr captured by job lambdas so they measure checkPropose and
checkValidation timing, not just message parsing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Phase4_taskList.md and 06-implementation-phases.md to reflect
completed implementation of all remaining Phase 4/4a tasks (4.2-4.6,
4a.5, 4a.6, 4a.8). Update exit criteria and summary tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining Phase 4/4a consensus tracing tasks:
- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()
Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sort PathFindSpanNames.h after AssetCache.h alphabetically in
PathRequestManager.cpp and Pathfinder.cpp to satisfy the project's
include-order convention enforced by pre-commit hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move SpanGuard.h (associated header) to first include position,
separated by blank line from other project includes, per the
project's include-order convention enforced by pre-commit hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Telemetry.h (associated header) to first include position in
Telemetry.cpp per the project's include-order convention. Trim
trailing whitespace from POC_taskList.md markdown table columns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 5 new panels to the consensus-health Grafana dashboard using Tempo
TraceQL queries against consensus.accept.apply span attributes:
- Close Time: Raw Proposals (Per Node) — each node's unrounded
wall-clock close_time_self, reveals clock drift across validators
- Close Time: Effective / Quantized — the consensus-agreed close_time
after rounding to resolution bins, written to ledger header
- Close Time Vote Bins & Resolution — number of distinct vote bins
(close_time_vote_bins) and bin size (close_resolution_ms) on dual axes
- Close Time Resolution Direction — whether resolution increased
(coarser), decreased (finer), or stayed unchanged
- Close Time Bin Distribution — bar chart showing how raw proposals
distribute across quantized bins per round
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integrate the existing StatsD metrics pipeline (beast::insight) into
the OpenTelemetry observability stack and add new trace spans for
ledger build/store/validate and peer proposal/validation receive.
Phase 5b — Ledger, peer, and transaction spans:
- Add ledger.build span with close time attributes in BuildLedger.cpp
- Add tx.apply span with tx_count/tx_failed in BuildLedger.cpp
- Add ledger.store and ledger.validate spans in LedgerMaster.cpp
- Add peer.proposal.receive span with trusted attribute in PeerImp.cpp
- Add peer.validation.receive span with ledger_hash, full, trusted
attributes in PeerImp.cpp
- Add ledger-operations and peer-network Grafana dashboards
Phase 6 — StatsD metrics integration:
- Add StatsD UDP receiver (port 8125) to OTel Collector
- Add 5 StatsD Grafana dashboards: node health, network traffic,
overlay traffic detail, ledger data sync, RPC pathfinding
- Add 09-data-collection-reference.md cataloging all metrics/spans
- Update existing dashboards with new span panels
- Expand telemetry runbook and integration test script
- Add codecov exclusions for telemetry modules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run .github/scripts/rename/docs.sh to replace rippled → xrpld
references in TESTING.md, xrpld-telemetry.cfg, and
telemetry-runbook.md, fixing the check-rename CI failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mark Tasks 5.3 (alert definitions) and 5.6 (training materials) as
"Deferred — post-MVP" in the implementation phases document to
accurately reflect current delivery scope. Add status column to the
Phase 5 task table.
Also fix stale reference to XRPL_TRACE_* macros in Phase 4a section —
the implementation uses SpanGuard factory methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 4-arg hashSpan overload was duplicated during a prior rebase
cascade — it appeared at both line 240 and line 305 in SpanGuard.cpp.
This would cause a linker error (multiple definition).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record the close time voting threshold and consensus state on
consensus.update_positions and consensus.check spans:
- xrpl.consensus.close_time_threshold: the avCT_CONSENSUS_PCT (75%)
threshold required for close time agreement
- xrpl.consensus.have_close_time_consensus: whether validators
reached close time consensus in this iteration
These attributes enable dashboards to show how the close time
voting process converges (or stalls) across consensus iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the span-replacement logic in startRoundTracing() that was
discarding the hash-derived round span and replacing it with a linked
span (which gets a random trace_id). The deterministic trace_id from
the ledger hash is the key feature enabling cross-node correlation —
replacing it broke correlation on all rounds after the first.
Also: use thread_local mt19937 for hashSpan() span IDs (same fix as
phase-3 txSpan), add Doxygen to establish tracing method declarations
in Consensus.h, and update SpanGuard.h diagram with hashSpan/addEvent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace SpanGuard::txSpan(prefix, name, hash) with the generic
SpanGuard::hashSpan(TraceCategory, name, hash) that accepts a
TraceCategory parameter instead of hardcoding Transactions. This
enables reuse for consensus round spans (Phase 4) and any future
subsystem needing deterministic cross-node trace correlation via
hash-derived trace IDs.
Both overloads are replaced:
- hashSpan(cat, name, hash, size) — standalone with random span_id
- hashSpan(cat, name, hash, size, parentSpanId, parentSize, flags)
— with remote parent from protobuf context propagation
Add full span name constants (tx_span::receive, tx_span::process)
to TxSpanNames.h following the ConsensusSpanNames.h pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark local variables in extractFromProtobuf() and injectToProtobuf()
as const since they are not modified after initialization: traceId,
spanId, flags, spanCtx, and span.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace thread_local mt19937 with xrpl::default_prng() for span ID
generation — uses the project's existing thread-local xor-shift engine.
One call yields a uint64_t (8 bytes), filling the span ID in a single
memcpy without loops.
Fix compilation failure when XRPL_ENABLE_TELEMETRY is not defined:
move xrpl.pb.h include outside the #ifdef guard in TxTracing.h since
protocol::TMTransaction is used unconditionally in the function
signature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-call std::random_device with thread_local std::mt19937 in
txSpan() for span ID generation. random_device is ~423x slower due to
/dev/urandom syscalls on each construction; mt19937 is seeded once per
thread and reused for all subsequent span IDs.
Update the SpanGuard class ASCII diagram to include txSpan factory
methods that were added in the hash-derived trace ID commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move TxSpanNames.h and TxQSpanNames.h from src/xrpld/telemetry/ to sit
next to the classes they instrument, matching the PathFindSpanNames.h
convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive trace_id from txHash[0:16] so all nodes handling the same
transaction produce spans under the same trace. Protobuf span_id
propagation provides parent-child relay ordering when available.
- Add SpanGuard::txSpan() factory methods (hash-derived trace ID)
- Add TxTracing.h helpers: txReceiveSpan(), txProcessSpan()
- Update PeerImp and NetworkOPs to use the new helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trace_id = txHash[0:16] strategy so all nodes handling the same
transaction independently produce spans under the same trace_id,
combined with protobuf span_id propagation for parent-child ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move scattered string literals from PeerImp.cpp and NetworkOPs.cpp into
compile-time constants in src/xrpld/telemetry/TxSpanNames.h. Follows
the same StaticStr/join() pattern established in Phase 1c for RPC spans.
Constants cover: span prefixes (tx), operations (receive, process),
attribute keys (hash, local, path, suppressed, status, peerId,
peerVersion), and values (sync, async, knownBad).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to old XRPL_TRACE_TX/CONSENSUS macros with
SpanGuard::span(TraceCategory, ...) factory calls introduced in Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the path finding subsystem with full span coverage:
- pathfind.request: wraps doPathFind() and doRipplePathFind() RPC handlers
- pathfind.compute: wraps PathRequest::doUpdate() with fast/normal attr
- pathfind.update_all: wraps PathRequestManager::updateAll() on ledger
close with ledger_index attr
- pathfind.discover: wraps Pathfinder::findPaths() graph exploration
with search_level attr
- pathfind.rank: wraps Pathfinder::computePathRanks() liquidity
validation with num_paths attr
New file: PathFindSpanNames.h with compile-time constants following
the StaticStr/join() pattern from Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move node health attribute strings to compile-time constants in
SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState)
- Add Tempo search filters for node health attributes
- Remove unnecessary .c_str() on strOperatingMode() return
- Add samplingRatio clamping test (values > 1.0 and < 0.0)
- Fix Task 2.3 status: delivered in Phase 1c, not Phase 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- serviceName default is "xrpld" not "rippled"
- Remove references to nonexistent exporterType field
- Pass networkId (4th param) to setup_Telemetry()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark deferred tasks (2.1→Phase 3, 2.5→low priority) with rationale.
Mark superseded tasks (2.2→Phase 1c SpanGuard factory). Add Task 2.7
for Grafana search filters. Update summary table with status column.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Grafana Tempo datasource: add rpc-command, rpc-status, rpc-role
search filters for the Explore UI
- Unit tests: TelemetryConfig (config parsing defaults and sections),
SpanGuardFactory (null guard safety, move semantics, discard, all
factory methods)
- Test CMake registration with optional OTel linking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add amendment_blocked and server_state span attributes to every
rpc.command.* span so operators can correlate RPC behavior with node state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces task list documents for Phases 2 through 5, with Tempo
references (replacing Jaeger) and Task 2.8 dashboard parity spec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[maybe_unused]] to the RAII span in processSession() — the
variable is not read but its lifetime scopes the active OTel context
for child spans created in processRequest()
- Regenerate levelization: remove premature xrpld.telemetry entries
that reference a module not yet present on this branch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use grpc_span::val::resourceExhausted constant instead of raw
"resource_exhausted" string in GRPCServer.cpp
- Fix unbounded span name cardinality in RPCHandler.cpp error path:
use fixed rpc_span::val::unknownCommand as span name instead of
user-supplied cmdName (attacker-controlled input). The actual
command is still captured in the xrpl.rpc.command attribute.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, cloned CallData instances (created for the next incoming
gRPC request) would have an empty name_, making subsequent span attrs
blank.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow project convention (PerfLog.h, SpanGuard.h) for documentation
diagrams. Show HTTP single, HTTP batch, and WebSocket span nesting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the span hierarchy, covered paths, and known instrumentation
gaps directly in the header that developers reference when adding spans.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace rpcSpan(fullName) calls with span(TraceCategory::Rpc, prefix,
name). Add 'using namespace telemetry' to both RPC files so call sites
read cleanly without repeated namespace qualifiers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete TracingInstrumentation.h and replace all XRPL_TRACE_* macro
invocations with direct SpanGuard::rpcSpan() calls. SpanGuard's pimpl
design and global Telemetry accessor eliminate the need for macro
wrappers and explicit Telemetry instance passing at call sites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TracingInstrumentation.h:
- Unify all span-creation macros to use std::optional<SpanGuard>
(fixes type mismatch between XRPL_TRACE_SPAN and SET_ATTR)
- Wrap XRPL_TRACE_SET_ATTR/EXCEPTION in do-while(0) (dangling-else)
- Move macros outside namespace blocks (macros are global)
- Cache telemetry reference to avoid double-evaluation
- Remove leaked _xrpl_span_ intermediate variable
- Add @note tags for thread safety, scope, and usage constraints
- Add 3 usage examples per CLAUDE.md requirements
ServerHandler.cpp:
- Remove misleading rpc.request span from onRequest() (span ended
before coroutine runs, producing orphan spans)
- Add rpc.http_request span to HTTP processSession() (runs inside
the coroutine, correct parent for rpc.process/rpc.command spans)
- Add XRPL_TRACE_EXCEPTION and error status in both catch blocks
(WS processSession and processRequest)
SpanGuard.h:
- Add null guards to all mutating methods (setOk, setStatus,
setAttribute, addEvent, recordException) for safety after discard()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
- Add [[nodiscard]] to factory and accessor methods
- NOLINT no-op stub instance methods that must stay non-static for API
parity with the real implementation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the single-arg span(name) factory that creates unconditional
spans without category gating. All call sites use the 3-arg
span(TraceCategory, prefix, name) variant which checks whether the
category is enabled in config before creating a span. The 1-arg form
was dead code with no callers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to non-existent TracingInstrumentation.h with
SpanGuard.cpp pimpl implementation that actually exists on this branch.
Update conditional compilation section to describe the pimpl approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace rpcSpan(), txSpan(), consensusSpan(), peerSpan(), ledgerSpan()
with a single span(TraceCategory, prefix, name) factory method. Adding
a new traceable subsystem now requires only a new enum value and one
switch case — no new methods or header changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SpanContext::isValid(): add inline no-op when XRPL_ENABLE_TELEMETRY
is not defined, preventing a linker error if called in that path
- linkedSpan(): set kIsRootSpanKey on the StartSpanOptions parent
context so linked spans start a genuinely independent sub-tree
instead of silently becoming children of the current active span
- Telemetry::instance_: use std::atomic with acquire/release ordering
to avoid a data race between start()/stop() and factory methods
called from worker threads
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add DiscardFlag.h and FilteringSpanProcessor references to the file
tree, key files table, and implementation summary in OpenTelemetryPlan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add span discard mechanism that drops unwanted spans before they enter
the batch export queue, saving both network bandwidth and storage.
FilteringSpanProcessor is a custom SpanProcessor decorator that wraps
BatchSpanProcessor. SpanGuard::discard() sets a thread-local flag
(tl_discardCurrentSpan) before calling Span::End(). The OTel SDK calls
OnEnd() synchronously on the same thread, where the flag is checked and
cleared to drop the span.
New file: DiscardFlag.h — zero-dependency header for the thread-local
flag, avoiding transitive include bloat from Telemetry.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate otlp/tempo exporter block, duplicate tempo service
definition, and jaeger dependency from docker-compose example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tempo is now the sole trace backend. Remove Jaeger all-in-one service
from docker-compose, otlp/jaeger exporter from OTel Collector config,
and Jaeger Grafana datasource provisioning file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run .github/scripts/rename/docs.sh to replace rippled → xrpld
references in all plan documentation files, fixing the check-rename
CI failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sub-task 7.10a: Per-Validator Validation Count (Flag Ledger Window)
to the Phase 7 task list. This metric tracks how many of the last 256
ledgers each UNL validator has validated — the key participation metric
for UNL health monitoring.
Implementation plan:
- Observable gauge rippled_validator_participation with validator label
- Data from RCLValidations::getTrustedForLedger() over 256-ledger window
- Emitted at flag ledger boundaries (~15 min interval)
- Grafana table panel with threshold coloring (green/yellow/red)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Populate baselines/baseline-timings.json from the green CI run
(24906110133, commit f11ebc1253). 25/31 metrics have non-null values;
6 span.rpc.* are null due to sparse data in the 3m window.
Remove the rpc_methods section from regression-metrics.json and its
thresholds. rippled_rpc_method_duration_us_bucket is never populated
because PerfLogImp::rpcEnd never calls MetricsRegistry::recordRpcFinished
— only recordRpcStarted is wired up (Phase 9 instrumentation gap).
The span-based rpc.request/rpc.process metrics via spanmetrics already
cover RPC latency.
Two CI failures traced to root cause:
1. rippled_jobq_job_count: 0 series — StatsDGaugeImpl declared
m_dirty{false} despite the constructor comment saying "start dirty".
Gauges whose value starts and stays at 0 never emitted, so Prometheus
never scraped them. Fix: m_dirty{true} on the member initializer.
2. TX error rate 82.8% — the submitter tracked account sequences
locally, but in a multi-node consensus network other nodes' txns
advance sequences independently. After a few ledger closes the
locally-tracked sequence fell behind the ledger, producing
tefPAST_SEQ for every subsequent submission. Fix: refresh account
sequences from account_info every 10 s during the submission loop.
- capture_timings.py: fail when captured/total ratio < 50%
(--min-capture-ratio). Prevents silent pass on unreachable Prometheus.
- run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so
the final exit code reflects it. Update exit code docs in header.
- compare_to_baseline.py: extract _skip_delta helper to bring
compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound
resolution (use explicit None check instead of `or`). Remove dead
variable override_prefix_key.
- prom_queries.py: extract _build_simple_entries and _build_job_entries
to bring build_query_plan under 80 lines. Fix module docstring return
type example. Use aiohttp.ClientTimeout instead of bare int.
- telemetry-validation.yml: add set -euo pipefail to regression summary
step; guard jq calls with -e flag and fallback; fail on missing
baseline file; emit ::warning annotation when timings.json missing.
- baselines/README.md: document the placeholder field.
Captures per-span / per-RPC / per-job timings from Prometheus after the
workload run and diffs them against a committed baseline. Regression
requires breaching both a percentage and an absolute bound, tolerating
small-value noise. When the baseline is a placeholder, the comparator
emits the captured JSON in the exact schema for one-time paste into
baselines/baseline-timings.json, and the CI Step Summary surfaces that
block for the reviewer.
Scope: gate only — automated baseline persistence, benchmark.sh
PromQL migration, and the historical trend dashboard remain follow-ups.
Plan documents referenced Application.h and app_ for getTelemetry()
but the codebase now uses ServiceRegistry as the interface. Updated:
- 05-configuration-reference.md: getTelemetry() on ServiceRegistry,
deferred serviceInstanceId pattern in ApplicationImp
- POC_taskList.md Task 4: target ServiceRegistry.h not Application.h,
correct config file path and constructor pattern
- 04-code-samples.md: fix overlay() -> getOverlay(), rewrite JobQueue
sample to reflect actual architecture (no app_ member)
- 03-implementation-strategy.md: fix file impact table path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip effort/risk columns from task tables and remove the §6.9 Effort
Summary section with its pie chart and resource requirements table.
Renumber §6.10 Quick Wins → §6.9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Jaeger-removal rebase used --ours conflict resolution which
dropped content added by intermediate phases (6-8): StatsD receiver,
filelog receiver, Loki service/exporter, health_check extension,
and OTLP metrics pipeline. Restore from pre-rebase origin minus
Jaeger references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix remaining Jaeger references that accumulated across intermediate
branches in the stacked PR chain. These were in files modified by
multiple phases where the per-branch fixes didn't cover all additions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate validate_telemetry.py to Tempo TraceQL search API, remove
Jaeger service from workload docker-compose, update readiness checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-include boost/asio/detail/socket_types.hpp on Windows before OTel
SDK headers to ensure WinSock2.h is included before WinSock.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ValidationTracker.cpp to xrpl.test.telemetry target sources
(implementation lives in src/xrpld/ but has no OTel SDK dependency)
- Change BEAST_DEFINE_TESTSUITE namespace from ripple to xrpl
- Replace recursive *.md glob with non-recursive GLOB in XrplDocs.cmake
to avoid picking up .claude/instructions.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update MockServiceRegistry to match current ServiceRegistry interface
(17 method renames: get* prefix, PathRequests→PathRequestManager)
- Make throwUnimplemented() static to satisfy clang-tidy
- Regenerate levelization ordering.txt and loops.txt
- Remove 'rippled' prefix from 3 StatsD dashboard titles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix validations_checked_total recording site (NetworkOPs, not LedgerMaster)
- Add Slide 11 to presentation: External Dashboard Parity overview with
Mermaid diagrams for new metric categories, ValidationTracker sequence,
and new dashboard summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove duplicate 'system-node-health' UID from expected_metrics.json
(already covered by 'rippled-system-node-health')
- Add parity span attributes to expected_spans.json: node health on
rpc.command.*, validation hash/full on consensus.validation.send,
quorum/proposers on consensus.accept, validation hash/full on
peer.validation.receive
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add agreement_pct_7d, agreements_7d, missed_7d labels to the
rippled_validation_agreement observable gauge, matching the external
xrpl-validator-dashboard's 7-day tracking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ValidationTracker member to MetricsRegistry with a public accessor,
register a rippled_validation_agreement observable gauge that calls
reconcile() and reports 1h/24h agreement percentages and counts, and
hook recordOurValidation/recordNetworkValidation into RCLConsensus
validate() and LedgerMaster setValidLedger() respectively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire incrementValidationsChecked() into NetworkOPs::recvValidation() so
each received network validation increments the counter.
Note: incrementJqTransOverflow() hook is deferred — JobQueue has no
explicit overflow event path; the counter is reserved for future use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add rippled_validation_agreements_total and rippled_validation_missed_total
counter declarations and creation (wiring to ValidationTracker pending rebase)
- Fix peer-quality dashboard: query rippled_server_info{metric="peer_disconnects_resources"}
instead of non-existent rippled_Overlay_Peer_Disconnects_Charges
- Remove dead getCountsJson() call in storageDetail callback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename fee labels to match spec: base_fee_drops -> base_fee_xrp,
reserve_base_drops -> reserve_base_xrp, reserve_inc_drops -> reserve_inc_xrp
- Add peers_insane_count (stub with TODO for PeerImp::tracking_ exposure)
- Add transaction_rate to ledger economy gauge
- Replace node_store_writes/node_written_bytes with nudb_bytes per spec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validator health, peer quality, ledger economy, state tracking, and
storage detail observable gauges plus 5 synchronous counters with recording
hooks for ledger close, validation send, state change, and overflow events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add window7d_ deque, agreementPct7d(), agreements7d(), missed7d() to
match the external xrpl-validator-dashboard's 7-day agreement tracking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover normal agreement, missed validation, late repair, empty window,
grace period boundary, max pending trimming, mixed results, duplicate
recording, and only-we-validated scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use >= instead of > for grace period comparison to reconcile at exactly
8 seconds rather than skipping the boundary
- Two-pass hard trim: first remove entries past late-repair window, then
any reconciled entry, to avoid sabotaging late repairs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone class that tracks whether this validator's validations agree
with network consensus, maintaining rolling 1h/24h windows and lifetime
totals with a late-repair mechanism for out-of-order arrivals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds ValidationTracker (agreement computation with 8s grace period),
validator health, peer quality, ledger economy, state tracking,
storage detail gauges, 7 synchronous counters, and agreement gauge.
29 new metrics covering validation agreement, peer quality, UNL health,
ledger economy, state tracking, and upgrade awareness.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridge the existing beast::insight gauge for resource-limit peer
disconnects (peerDisconnectsCharges_) into the StatsD metric inventory.
Part of the external dashboard parity initiative.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The xrpld/app/misc/CanonicalTXSet.h header doesn't exist — it was
incorrectly added during a rebase conflict resolution. The correct
include xrpl/ledger/CanonicalTXSet.h is already present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ledger hash and full-validation flag to peer.validation.receive
spans for trace-level agreement analysis across validators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move xrpld data paths from ./data/ to docker/telemetry/data/ so runtime
files stay within the docker telemetry directory. Add .gitignore to
exclude the data directory from version control.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move validation_quorum and proposers_validated attributes from
consensus.accept.apply to consensus.accept span to match the design
spec. Both values are available in onAccept() scope.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validation ledger hash and full-validation flag to
consensus.validation.send spans, plus quorum and proposer count to
consensus.accept spans for trace-level agreement analysis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds ledger_hash, validation.full to validation send/receive spans,
and validation_quorum, proposers_validated to consensus.accept spans.
Foundation for Phase 7 ValidationTracker agreement computation.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tag transaction receive spans with the relaying peer's rippled version
to enable version-mismatch correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add amendment_blocked and server_state span attributes to every
rpc.command.* span so operators can correlate RPC behavior with node state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds node health context (amendment_blocked, server_state) to rpc.command.*
spans, inspired by the community xrpl-validator-dashboard.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tempo is now the sole trace backend. Remove Jaeger all-in-one service
from docker-compose, otlp/jaeger exporter from OTel Collector config,
and Jaeger Grafana datasource provisioning file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strip effort/risk columns from task tables and remove the §6.9 Effort
Summary section with its pie chart and resource requirements table.
Renumber §6.10 Quick Wins → §6.9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This change:
* Removes a set of unnecessary brackets in the initialization of an `std::uint32_t`.
* Fixes a couple of incorrect flags (same value, just wrong variables - so no amendment needed).
This change replaces all instances of `<variable> != tesSUCCESS` with `!isTesSuccess(<variable>)` and `<variable> == tesSUCCESS` with `isTesSuccess(<variable>)`.
This change fixes delegation:
* If the Delegate object is not present, we should disallow empty permission list in DelegateSet preclaim.
* Empty permission list is only allowed to delete the existing Delegate object.
* In `doApply`, permission list being empty returns `tecINTERNAL`, which should not happen.
This change:
* Makes `addSLE` in `DIDSet` a static function, instead of a free function.
* Renames `Attestation` to `Data` everywhere (an artifact of a previous name for the field).
* Actually runs a set of tests that were not included in the `run` function of `DID_test`.
This change:
* Introduces a new helper function on `STTx`, `getFeePayer`.
* Removes the usage of `mSourceBalance` and replaces it with SLE balance lookups.
* Renames `mPriorBalance` to `preFeeBalance_`
This simplifies some of the code in the transactors and makes it a lot more readable.
Throwing exceptions from code sometime confuses ASAN, as it cannot keep track of stack frames. This change therefore adds a macro to skip instrumentation around the `Throw` function.
This change moves the sanitizer runtime options out to dedicated files, such that they can be used in multiple places (CI, local runs) without any need to rewrite them.
Per [XLS-0095](https://xls.xrpl.org/xls/XLS-0095-rename-rippled-to-xrpld.html), we are taking steps to rename ripple(d) to xrpl(d). This change modifies the system name from `rippled` to `xrpld`.
The system name is used in limited places:
* When no explicit config file is passed via the `--config` flag, then the system name is used to construct the path where the config file and database may be stored, via the `$XDG_CONFIG_HOME` and `$XDG_DATA_HOME` directories, respectively.
* It is used in the metadata and user-agent as part of RPC calls.
* It is newly used in the full version string.
ASAN wasn't able to keep track of `boost::coroutine` context switches, and would lead to many false positives being detected. By switching to `boost::coroutine2` and `ucontext`, ASAN is able to know about the context switches advertised by the `boost::fiber` class, which in turn leads to more cleaner ASAN analysis.
The `HTTPClient` class initializes a global SSL context via `initializeSSLContext()`. However, it had no way to release it, which caused memory leaks flagged by the LeakSanitizer. Multiple LSAN suppressions in the sanitizers' suppressions file were masking these leaks. Our test code also manually called `initializeSSLContext()` in each test without guaranteed cleanup on failure paths.
This change fixes these memory leaks by adding a `cleanupSSLContext()` method to properly release the global SSL context, and removes the corresponding LSAN suppressions. The change further refactors the `HTTPClient` tests to use a Google Test fixture (`HTTPClientTest`) that manages the SSL context lifecycle via RAII (SetUp/TearDown), making it impossible for tests to leak the context.
This change deletes the `SecretKey` equality/inequality operators from the public library header and moves the comparison logic into test-only code.
Specifically, the `operator==` and `operator!=` free functions on `SecretKey` have been removed from `include/xrpl/protocol/SecretKey.h` and have been replaced with explicitly deleted member functions to prevent accidental use in production code. A named `test::equal()` helper has also been added in `src/test/unit_test/utils.h` for test assertions that need to compare secret keys.
This change removes an unnecessary newline in a logging statement. Namely, `std::endl` is unneeded in `JLOG`, since it automatically places a newline at the end of the string.
This change enforces a maximum length of 63 characters on feature names, as well as not permitting an exactly 32 character long feature name to avoid confusion with those that use a `uint256` hex representation, as that is an alternative way to specify a feature. This change further prevents the use of Unicode characters in feature names, because some can be confused with regular ASCII characters despite being valid in identifiers.
The `setFieldU32` call is currently used to set the credential's expiry using a `getFieldU32` call to obtain the expiration time. However, that value was already obtained previously and can thus be reused.
DID, Escrows, PaymentChannels, and Credentials previously contained multiple unrelated transactor classes in a single header/implementation pair. This change splits each into one class per file, following the same pattern established by the rest of the codebase.
Now that prefixes in PR titles are being validated as part of CI, the "Type of Change" section in the PR template is no longer needed. The prefixes and descriptions in the `CONTRIBUTING.md` file have been updated to reflect the currently supported list.
Subscribe tests have a problem that there is no way to synchronize application running in background threads and test threads. Threads are communicating via websocket messages. When the code is compiled in debug mode with code coverage enabled it executes quite slow, so receiving websocket messages by the client in subscribe tests may time out.
This change does 2 things to fix the problem:
* Increases timeout for receiving a websocket message.
* Decreases the number of tests running in parallel.
While testing the fix for subscribe test another flaky test in ledger replay was found, which has also been addressed.
The refs as previously used pointed to the source branch, not the target branch. However, determining the target branch is different depending on the GitHub event. The pull request logic was incorrect and needed to be fixed, and the logic inside the workflow could be simplified. Both modifications have been made in this commit.
This change reorganizes the `tx/transactors` directory for consistency and discoverability. There are no behavioral changes, this is a pure refactor. Underscores were chosen as the way to separate multi-words as this is the more popular option in C++ projects.
Specific changes:
- Rename all subdirectories to lowercase/snake_case (`AMM` → `amm`, `Check` → `check`, `NFT` → `nft`, `PermissionedDomain` → `permissioned_domain`, etc.)
- Merge `AMM/` and `Offer/` into `dex/`, including `PermissionedDEXHelpers`
- Rename `MPT/` → `token/`, absorbing `SetTrust` and `Clawback`
- Move top-level transactors into named groups: `account/`, `bridge/`, `credentials/`, `did/`, `escrow/`, `oracle/`, `payment/`, `payment_channel/`, `system/`
- Update all include paths across the codebase and `transactions.macro`
The existing code added the git commit info (`GIT_COMMIT_HASH` and `GIT_BRANCH`) to every file, which was a problem for leveraging `ccache` to cache build objects. This change adds a separate C++ file from where these compile-time variables are propagated to wherever they are needed. A new CMake file is added to set the commit info if the `git` binary is available.
When `gateway_balances` gets called on an account that is involved in the `EscrowCreate` transaction (with MPT being escrowed), the method returns internal error. This change fixes this case by excluding the MPT type when totaling escrow amount.
The `Subscribe` tests were flaky, because each test performs some operations (e.g. sends transactions) and waits for messages to appear in subscription with a 100ms timeout. If tests are slow (e.g. compiled in debug mode or a slow machine) then some of them could fail. This change adds an attempt to synchronize the background Env's thread and the test's thread by ensuring that all the scheduled operations are started before the test's thread starts to wait for a websocket message. This is done by limiting I/O threads of the app inside Env to 1 and adding a synchronization barrier after closing the ledger.
The invariant check system had grown into a single monolithic file pair containing 24 invariant checker classes. The large `InvariantCheck.cpp` file was a frequent source of merge conflicts and difficult to navigate. This refactoring improves maintainability and readability with zero behavioral changes.
In particular, this change:
- Splits `InvariantCheck.h` and `InvariantCheck.cpp` into 10 focused header/source pairs organized by domain under a new `invariants/` subdirectory.
- Extracts the shared `Privilege` enum and `hasPrivilege()` function into a dedicated `InvariantCheckPrivilege.h` header, so domain-specific files can reference them independently.
This change enables all clang-tidy checks that are already passing. It also modifies the clang-tidy CI job, so it runs against all files if .clang-tidy changed.
This change replaces `void const*` by `uint256 const&` for database fetches.
Object hashes are expressed using the `uint256` data type, and are converted to `void *` when calling the `fetch` or `fetchBatch` functions. However, in these fetch functions they are converted back to `uint256`, making the conversion process unnecessary. In a few cases the underlying pointer is needed, but that can then be easy obtained via `[hash variable].data()`.
Updates XRPLF/actions prepare-runner to version 2cbf48101 which fixes
pip upgrade failures on macOS runners with Homebrew-managed Python.
* This commit was cherry-picked from "release-3.1", but ended up empty
because the changes are already present. It is included only for
accounting - to indicate that all changes/commits from the previous
release will be in the next one.
For coverage builds, we try to link against the `gcov` library (specific to the environment). But as macOS doesn't have this library and thus doesn't have the coverage tools to generate reports, the coverage builds on that platform were failing on linking.
We actually don't need to explicitly force this linking, as the `CodeCoverage` file already has correct detection logic (currently on lines 177-193), which is invoked when the `--coverage` flag is provided:
* AppleClang: Uses `xcrun -f llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`.
* Clang: Finds `llvm-cov` to set `GCOV_TOOL="llvm-cov gcov"`.
* GCC: Finds `gcov` to set `GCOV_TOOL="gcov"`.
The `GCOV_TOOL` is then passed to `gcovr` on line 416, so the correct tool is used for processing coverage data.
This change therefore removes the `gcov` suffix from lines 473 and 475 in the `CodeCoverage.cmake` file.
The rdb module was not properly designed, which is fixed in this change. The module had three classes:
1) The abstract class `RelationalDB`.
2) The abstract class `SQLiteDatabase`, which inherited from `RelationalDB` and added some pure virtual methods.
3) The concrete class `SQLiteDatabaseImp`, which inherited from `SQLiteDatabase` and implemented all methods.
The updated code simplifies this as follows:
* The `SQLiteDatabaseImp` has become `SQLiteDatabase`, and
* The former `SQLiteDatabase `has merged with `RelationalDatabase`.
This change modularizes the `WalletDB` and `Manifest`. Note that the wallet db has nothing to do with account wallets and it stores node configuration, which is why it depends on the manifest code.
This change removes the cache in `DatabaseNodeImp` and simplifies the caching logic in `SHAMapStoreImp`. As NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary, as also confirmed during preliminary performance analyses.
In certain cases, such as when modifying headers used by many compilation units, performing a unity build is slower than when performing a regular build with `ccache` enabled. There is also a benefit to a unity build in that it can detect things such as macro redefinitions within the group of files that are compiled together as a unit. This change therefore restores the ability to perform unity builds. However, instead of running every configuration with and without unity enabled, it is now only enabled for a single configuration to maintain lower computational use.
As part of restoring the code, it became clear that currently two configurations have coverage enabled, since the check doesn't focus specifically on Debian Bookworm so it also applies to Debian Trixie. This has been fixed too in this change.
The `ManifestCache::applyManifest` function was returning early without incrementing `seq_`. `OverlayImpl `uses this sequence to identify/invalidate a cached `TMManifests` message, which is exchanged with peers on connection. Depending on network size, startup sequencing, and topology, this can cause syncing issues. This change therefore increments `seq_` when a new manifest is accepted.
Unity builds were intended to speed up builds, by bundling multiple files into compilation units. However, now that ccache is available on all platforms, there is no need for unity builds anymore, as ccache stores compiled individual build objects for reuse. This change therefore removes the ability to make unity builds.
Currently we're passing the `Application` object around, whereby the `Application` class acts more like a service registry that gives other classes access to other services. In order to allow modularization, we should replace `Application` with a service registry class so that modules depending on `Application` for other services can be moved easily. This change adds the `ServiceRegistry` class.
This change introduces the `fixExpiredNFTokenOfferRemoval` amendment that allows expired offers to pass through `preclaim()` and be deleted in `doApply()`, following the same pattern used for expired credentials.
This change adds the project configuration directory to `.gitignore` for the `zed` editor.
As per the [documentation](https://zed.dev/docs/remote-development?highlight=.zed#zed-settings), the project configuration files are stored in the `.zed` directory at the project root dir.
This change cleans up the `API-CHANGELOG.md` file. It moves the version-specific documentation to other files and fleshes out the changelog with all the API-related changes in each version.
When support was added for `xrpld.cfg` in addition to `rippled.cfg` in https://github.com/XRPLF/rippled/pull/6098, as part of an effort to rename occurrences of ripple(d) to xrpl(d), the clearing and creation of the data directory were modified for what, at the time, seemed to result in an equivalent code flow. This has turned out to not be true, which is why this change restores two modifications to `Config.cpp` that currently break running the binary in standalone mode.
This change adds `cmake-format` as. a pre-commit hook. The style file closely matches that in Clio, and they will be made to be equivalent over time. For now, some files have been excluded, as those need some manual adjustments, which will be done in future changes.
This change makes the `read` function call in `handleConnection` async, adds a new class `TestSink` to help debugging, and adds a new target `xrpl.tests.helpers` to put the helper class in.
To allow developers to consume the latest unstable and (near-)stable versions of our `xrpl` Conan recipe, we should export and upload it whenever a push occurs to the corresponding branch or a release tag has been created. This way, developers do not have to figure out themselves what the most recent shortened commit hash was to determine the latest unstable recipe version (e.g. `3.2.0-b0+a1b2c3d`) or what the most recent release (candidate) was to determine the latest (near-)stable recipe version (e.g. `3.1.0-rc2`).
Now, pushes to the `develop` branch will produce the `develop` recipe version, pushes to the `release` branch will produce the `rc` recipe version, and creation of versioned tags will produce the `release` recipe version.
This change replaces the mutex `stoppingMutex_`, the `atomic_bool` variable `isTimeToStop`, and the conditional variable `stoppingCondition_` with an `atomic_flag` variable.
When `xrpld` is running the embedded tests as a child process, it has a control thread (the app bundle thread) that starts the application, and an application thread (the thread that executes `app_->run()`). Due to the relaxed memory ordering on ARM, it's not guaranteed that the application thread can see the change of the value resulting from the `isTimeToStop.exchange(true)` call before it is notified by `stoppingCondition_.notify_all()`, even though they do happen in the right order in the app bundle thread in `ApplicationImp::signalStop`. We therefore often get into the situation where `isTimeToStop` is `true`, but the application thread is waiting for `stoppingCondition_` to notify, because the app bundle thread may have already notified before the application thread actually starts waiting.
Switching to a single `atomic_flag` variable makes sure that there's only one synchronisation object and then the memory order guarantee provided by c++ can make sure that `notify_all` gets synchronised after `test_and_set` does.
Fixing this issue will stop the unit tests hanging forever and then we should see less (or hopefully no) time out errors in daily github action runs
Since the minimum Clang version we support is 16, the checks for version < 15 are no longer necessary. This change therefore removes the macros checking if the clang version is < 15 and simplifies uses of `std::source_location`.
The `upload-conan-deps` workflow that's triggered on push is supposed to upload the Conan dependencies to our remote, so future PR commits can pull those dependencies from the remote. However, as the `sanitize` argument is missing, it was building different dependencies than what the PRs are building for the asan/tsan/ubsan job, so the latter would not find anything in the remote that they could use. This change sets the missing `sanitizers` input variable when running the `build-deps` action.
Separately, the `setup-conan` action showed the default profile, while we are using the `ci` profile. To ensure the profile is correctly printed when sanitizers are enabled, the environment variable the profile uses is set before calling the action.
The export and upload steps were initially in a separate action, where GitHub Actions does not support the `secrets` keyword, but only `inputs` for the credentials. After they were moved to a reusable workflow, only part of the references to the credentials were updated. This change correctly references to the Conan credentials via `secrets` instead of `inputs`.
By default the Conan recipe extracts the version from `BuildInfo.cpp`, but in some of the cases we want to upload a recipe with a suffix derived from the commit hash. This currently then results in the uploading to fail, since there is a version mismatch.
Here we explicitly set the version, and then simplify the steps in the upload workflow since we now need the recipe name (embedded within the conanfile.py but also needed when uploading), the recipe version, and the recipe ref (name/version).
Conan recipes use semantic versioning, and since our version already contains a hyphen the second hyphen causes Conan to ignore it. The plus sign is a valid separator we can use instead, so this change uses a `+` to separate a version suffix (commit hash) instead of a `-`.
There were a few uninitialized variables in CMake files. This change will make sure we always check if a variable has been initialized before using them, or in come cases initialize them by default. This change will raise an error on CI if a developer introduced an uninitialized variable in CMake files.
This change continues the thread naming work from #5691 and #5758, which enables more useful lock contention profiling by ensuring threads/jobs have short, stable, human-readable names (rather than being truncated/failing due to OS limits). This changes diagnostic naming only (thread names and job/load-event labels), not behavior.
Specific modifications are:
* Shortens all thread/job names used with `beast::setCurrentThreadName`, so the effective Linux thread name stays within the 15-character limit.
* Removes per-ledger sequence numbers from job/thread names to avoid long labels. This improves aggregation in lock contention profiling for short-lived job executions.
The Ripple Bug Bounty program recently changed the public keys that security researchers can use to encrypt vulnerabilities and messages for submission to the program. This information was updated on https://ripple.com/legal/bug-bounty/ and this PR updates the `SECURITY.md` to align.
During several iterations of development of https://github.com/XRPLF/rippled/pull/6235, the commit hash was supposed to be moved into the `run:` statement, but it slipped through the cracks and did not get added. This change adds the commit hash as suffix to the Conan recipe version.
The `Number.h` header file now has `std::reference_wrapper` from `<functional>`, but the include is missing, causing downstream build problems. This change adds the header.
This change uploads the `libxrpl` library as a Conan recipe to our remote when (i) merging into the `develop` branch, (ii) committing to a PR that targets a `release*` branch, and (iii) a versioned tag is applied. Clio is only notified in the second case. The user and channel are no longer used when uploading the recipe.
Specific changes are:
* A `generate-version` action is added, which extracts the build version from `BuildInfo.cpp` and appends the short 7-character commit hash to it for merges into the `develop` branch and for commits to a PR that targets a `release*` branch. When a tag is applied, however, the tag itself is used as the version. This functionality has been turned into a separate action as we will use the same versioning logic for creating .rpm and .deb packages, as well as Docker images.
* An `upload-recipe` action is added, which calls the `generate-version` action and further handles the uploading of the recipe to Conan.
* This action is called by both the `on-pr` and `on-trigger` workflows, and a new `on-tag` workflow.
The reason for this change is that we have downstream uses for the `libxrpl` library, but currently only upload the recipe to check for compatibility with Clio when making commits to a PR that targets the release branch.
`PeerImp` processes `TMGetObjectByHash` queries with an unbounded per-request loop, which performs a `NodeStore` fetch and then appends retrieved data to the reply for each queried object without a local count cap or reply-byte budget. However, the `Nodestore` fetches are expensive when high in numbers, which might slow down the process overall. Hence this code change adds an upper cap on the response size.
This change removes the `master` branch as a trigger for the CI pipelines, and updates comments accordingly. It also fixes the pre-commit workflow, so it will run on all release branches.
These "fixed location" objects can be found in multiple ways:
1. The lookup parameters use the same format as other ledger objects, but the only valid value is true or the valid index of the object:
- Amendments: "amendments" : true
- FeeSettings: "fee" : true
- NegativeUNL: "nunl" : true
- LedgerHashes: "hashes" : true (For the "short" list. See below.)
2. With RPC API >= 3, using special case values to "index", such as "index" : "amendments". Uses the same names as above. Note that for "hashes", this option will only return the recent ledger hashes / "short" skip list.
3. LedgerHashes has two types: "short", which stores recent ledger hashes, and "long", which stores the flag ledger hashes for a particular ledger range.
- To find a "long" LedgerHashes object, request '"hashes" : <ledger sequence>'. <ledger sequence> must be a number that evaluates to an unsigned integer.
- To find the "short" LedgerHashes object, request "hashes": true as with the other fixed objects.
The following queries are all functionally equivalent:
- "amendments" : true
- "index" : "amendments" (API >=3 only)
- "amendments" : "7DB0788C020F02780A673DC74757F23823FA3014C1866E72CC4CD8B226CD6EF4"
- "index" : "7DB0788C020F02780A673DC74757F23823FA3014C1866E72CC4CD8B226CD6EF4"
Finally, whether the object is found or not, if a valid index is computed, that index will be returned. This can be used to confirm the query was valid, or to save the index for future use.
This change adds support for sanitizer build options in CI builds workflow. Currently `asan+ubsan` is enabled, while `tsan+ubsan` is left disabled as more changes are required.
The word `failed` in the test case makes it hard to search through the test logs when an actual test failure occurs, so this change renames the word to just `fail` instead.
- Refactor Number internals away from int64 to uint64 & a sign flag
- ctors and accessors use `rep`. Very few things expose
`internalrep`.
- An exception is "unchecked" and the new "normalized", which explicitly
take an internalrep. But with those special control flags, it's easier
to distinguish and control when they are used.
- For now, skip the larger mantissas in AMM transactions and tests
- Remove trailing zeros from scientific notation Number strings
- Update tests. This has the happy side effect of making some of the string
representations _more_ consistent between the small and large
mantissa ranges.
- Add semi-automatic rounding of STNumbers based on Asset types
- Create a new SField metadata enum, sMD_NeedsAsset, which indicates
the field should be associated with an Asset so it can be rounded.
- Add a new STTakesAsset intermediate class to handle the Asset
association to a derived ST class. Currently only used in STNumber,
but could be used by other types in the future.
- Add "associateAsset" which takes an SLE and an Asset, finds the
sMD_NeedsAsset fields, and associates the Asset to them. In the case
of STNumber, that both stores the Asset, and rounds the value
immediately.
- Transactors only need to add a call to associateAsset _after_ all of
the STNumbers have been set. Unfortunately, the inner workings of
STObject do not do the association correctly with uninitialized
fields.
- When serializing an STNumber that has an Asset, round it before
serializing.
- Add an override of roundToAsset, which rounds a Number value in place
to an Asset, but without any additional scale.
- Update and fix a bunch of Loan-related tests to accommodate the
expanded Number class.
---------
Co-authored-by: Vito <5780819+Tapanito@users.noreply.github.com>
- Spec: XLS-66
Fix overpayment asserts (#6084)
MPTTester::operator() parameter should be std::int64_t
- Originally defined as uint64_t, but the testIssuerLoan() test called
it with a negative number, causing an overflow to a very large number
that in some circumstances could be silently cast back to an int64_t,
but might not be. I believe this is UB, and we don't want to rely on
that.
Review feedback from @Tapanito: overpayment value change
- In overpayment results, the management fee was being calculated twice:
once as part of the value change, and as part of the fees paid.
Exclude it from the value change.
Fix Overpayment Calculation (#6087)
- Adds additional unit tests to cover math calculations.
- Removes unused methods.
Review feedback from @shawnxie999: even more rounding
- Round the initial total value computation upward, unless there is
0-interest.
- Rename getVaultScale to getAssetsTotalScale, and convert one incorrect
computation to use it.
- Use adjustImpreciseNumber for LossUnrealized.
- Add some logging to computeLoanProperties.
Fix LoanBrokerSet debtMaximum limits (#6116)
Fix some minor bugs in Lending Protocol (#6101)
- add nodiscard to unimpairLoan, and check result in LoanPay
- add a check to verify that issuer exists
- improve LoanManage error code for dust amounts
Check permissions in LoanSet and LoanPay (#6108)
Disallow pseudo accounts to be Destination for LoanBrokerCoverWithdraw (#6106)
Ensure vault asset cap is not exceeded (#6124)
Fix Overpayment ValueChange calculation in Lending Protocol (#6114)
- Adds loan state to LoanProperties.
- Cleans up computeLoanProperties.
- Fixes missing management fee from overpayment.
fix: Enable LP Deposits when the broker is the asset issuer (#6119)
* Replace accountHolds with accountSpendable when checking
for account funds in VaultDeposit and LoanBrokerCoverDeposit
Add a few minor changes (#6158)
- Updates or fixes a couple of things I noticed while reviewing changes
to the spec.
- Rename sfPreviousPaymentDate to sfPreviousPaymentDueDate.
- Make the vault asset cap check added in #6124 a little more robust:
1. Check in preflight if the vault is _already_ over the limit.
2. Prevent overflow when checking with the loan value. (Subtract
instead of adding, in case the values are near maxint. Both return
the same result. Also add a unit test so each case is covered.
Add minimum grace period validation (#6133)
Fix bugs: frozen pseudo-account, and FLC cutoff (#6170)
refactor: Rename raw state to theoretical state (#6187)
Check if a withdrawal amount exceeds any applicable receiving limit. (#6117)
Fix overpayment result calculation (#6195)
Address review feedback from Lending Protocol re-review (#6161)
---------
Co-authored-by: Gregory Tsipenyuk <gregtatcam@users.noreply.github.com>
Co-authored-by: Bronek Kozicki <brok@incorrekt.com>
Co-authored-by: Vito Tumas <5780819+Tapanito@users.noreply.github.com>
Co-authored-by: Shawn Xie <35279399+shawnxie999@users.noreply.github.com>
Co-authored-by: Jingchen <a1q123456@users.noreply.github.com>
This change removes unnecessary version numbers in the OpenSSL and Boost `find_package` CMake statements. An unnecessary OpenSSL definition is removed, while Conan options for SSL are updated to disable insecure ciphers. Moreover, the statements are now ordered alphabetically and more logically.
- Introduces amendment `fixBatchInnerSigs`
- Update Batch unit tests
- Fix all the Env instantiations to _use_ the "features" parameter.
- testInnerSubmitRPC runs with Batch enabled and disabled.
- Add a test to testInnerSubmitRPC for a correctly signed tx incorrectly
using the tfInnerBatchTxn flag.
- Generalize the submitAndValidate lambda in testInnerSubmitRPC.
- With the fix amendment, a transaction never reaches the transaction
engine (Transactor and derived classes.)
- Test submitting a pseudo-transaction. Stopped before reaching the
transaction engine, but with different errors.
- The tests verify that without the amendment, a transaction with
tfInnerBatchTxn is immediately rejected. Without the amendment, things
are safe. The amendment just makes things safer and more future-proof.
- Adds a mechanism for the vault owner to burn user shares when the vault is stuck. If the Vault has 0 AssetsAvailable and Total, the owner may submit a VaultClawback to reclaim the worthless fees, and thus allow the Vault to be deleted. The Amount must be left off (unless the owner is the asset issuer), specified as 0 Shares, or specified as the number of Shares held.
This change:
* Truncates thread names if more than 15 chars with `snprintf`.
* Adds warnings for truncated thread names if `-DTRUNCATED_THREAD_NAME_LOGS=ON`.
* Add a static assert for string literals to stop compiling if > 15 chars.
* Shortens `Resource::Manager` to `Resource::Mngr` to fix the static assert failure.
* Updates `CurrentThreadName_test` unit test specifically for Linux to verify truncation.
This change updates the XRPLF pre-commit workflow and prepare-runner action to their latest versions. For naming consistency the prepare-runner action changed the disable_ccache variable into enable_ccache, which matches our naming.
This change fixes the last of the spelling issues, and enables the pre-commit (and CI) check for spelling. There are no functionality changes, but it does rename some enum values.
Right now, each pipeline invocation builds the source code from scratch. Although compiled Conan dependencies are cached in a remote server, the source build objects are not. We are able to further speed up our builds by leveraging `ccache`. This change enables caching of build objects using `ccache` on Linux, macOS, and Windows.
This change adds some basic tests for all the `ledger_entry` helper functions, so each ledger entry type is covered. There are further some minor refactors in `parseAMM` to provide better error messages. Finally, to improve readability, alphabetization was applied in the helper functions.
This change renames all occurrences of `rippled.cfg` to `xrpld.cfg`. It also provides a script to allow developers to replicate the changes in their local branch or fork to avoid conflicts. For the time being it maintains support for `rippled.cfg` as config file, if `xrpld.cfg` does not exist.
This change modifies the build directory structure from `build/build/xxx` or `.build/build/xxx` to just `build/xxx`. Namely, the `conanfile.py` has the CMake generators build directory hardcoded to `build/generators`. We may as well leverage the top-level build directory without introducing another layer of directory nesting.
`Json::Object` and related objects are not used at all, so this change removes `include/xrpl/json/Object.h` and all downstream files. There are a number of minor downstream changes as well.
Full list of deleted classes and functions:
* `Json::Collections`
* `Json::Object`
* `Json::Array`
* `Json::WriterObject`
* `Json::setArray`
* `Json::addObject`
* `Json::appendArray`
* `Json::appendObject`
The last helper function, `copyFrom`, seemed a bit more complex and was actually used in a few places, so it was moved to `LedgerToJson.h` instead of deleting it.
The latest update to `cleanup-workspace`, `get-nproc`, and `prepare-runner` moved the action to the repository root directory, and also includes some ccache changes. In response, this change updates the various shared actions to the latest commit hash.
This change renames all occurrences of `namespace ripple` and `ripple::` to `namespace xrpl` and `xrpl::`, respectively, as well as the names of test suites. It also provides a script to allow developers to replicate the changes in their local branch or fork to avoid conflicts.
Per [XLS-0095](https://xls.xrpl.org/xls/XLS-0095-rename-rippled-to-xrpld.html), we are taking steps to rename ripple(d) to xrpl(d).
This change modifies the binary name from `rippled` to `xrpld`, and creates a symlink named `rippled` that points to the `xrpld` binary.
Note that https://github.com/XRPLF/rippled/pull/5975 renamed any references to `rippled` in the CMake files and their contents, but explicitly maintained the `rippled` binary name by adding an exception. This change now undoes this exception and adds an explicit symlink instead.
This change renames all the `info()` functions to `header()`, since they return `LedgerHeader` structs. It also renames the underlying variables from `info_` to `header_`.
This PR renames `LedgerInfo` to `LedgerHeader`. Namely, `LedgerInfo` was already an alias for `LedgerHeader`, and the comments next to the alias suggested that it would make sense to rename it, since that makes it clearer what it is.
This PR cleans up `RPCHelpers.h` and `RPCHelpers.cpp`. It splits out all the fetch-ledger functions to a new set of files, `RPCLedgerHelpers.h`/`RPCLedgerHelpers.cpp`, and moves the general-API functions to `ApiVersion.h`. There is no functionality change.
The .gitignore and .gitattributes files contain references to files and directories that the current build no longer produces, so this change removes obsolete entries in these files, and does some general reorganizing of the remaining entries.
This change updates the secp256k1 recipe that defines the SECP256K1_STATIC, so it no longer needs to be defined in the code here. Running the Conan update script also updated two other recipes in the lock file.
This PR updates protobuf and grpc to their latest versions. The latest protobuf version no longer requires patches, so we can use it directly from the official Conan Center Index, while the latest grpc still needed a patch, which was added to our own Conan Center Index fork in XRPLF/conan-center-index#8.
This change substitutes the secp256k1 source code copy by the Conan recipe added in XRPLF/conan-center-index#24, which updates the version of the library to 0.7.0.
- Spec: XLS-66
- Introduces amendment "LendingProtocol", but leaves it UNSUPPORTED to
allow for standalone testing, future development work, and potential
bug fixes.
- AccountInfo RPC will indicate the type of pseudo-account when
appropriate.
- Refactors and improves several existing classes and functional areas,
including Number, STAmount, STObject, json_value, Asset, directory
handling, View helper functions, and unit test helpers.
This change moves the lockfile instructions into a script, and instead of removing packages it sets the Conan home directory to a temporary directory.
There are several advantages, such as:
* Not affecting the user's Conan home directory, so there is no need to remove packages.
* Only the script needs to be run, rather than several commands.
This change updates the instructions for how to generate a Conan lockfile that is compatible with Linux, macOS, and Windows.
Whenever Conan dependencies change, the Conan lock file needs to be regenerated. However, different OSes have slightly different requirements, and thus require slightly different dependencies. Luckily a single `conan.lock` file can be used, as long as the union of dependencies is included.
The current instructions in the `BUILD.md` file are insufficient to regenerate the lock file such that it is compatible with all OSes, which this change addresses. The three profiles contain the bare minimum needed to generate the lockfile; it does not particularly matter whether the build type is Debug or Release, for instance.
This change triggers the Clio pipeline on PRs that target any of the `release*` branches (in addition to the `master` branch), as opposed to only the `release` branch.
If the config disables SQL db usage, such as a validator:
```
[ledger_tx_tables]
use_tx_tables = 0
```
then the pointer to DB engine is null, but it was still resolved during startup. Although it didn't crash in Release mode, possibly due to the compiler optimizing it away, it did crash in Debug mode. This change explicitly checks for the validity of the pointer and generates a runtime error if not set.
The `ledger_entry` and `deposit_preauth` requests require an array of credentials. However, the array size is not checked before is gets processing. This fix adds checks and return errors in case array size is too big.
This PR splits `RPCHelpers.h` into two files, by moving out all the ledger-fetching-related functions into a separate file, `RPCLedgerHelpers.h`. It also moves `getAccountObjects` to `AccountObjects.h`, since it is only used in that one place.
Technically b0 is not a release, so no "release" prefix here. It marks the point at which we moved the preceding release (3.0.0 in this case) from Beta to Release Candidate.
This change fixes floating point errors in conversion of shares to assets and other way, used in `VaultDeposit`, `VaultWithdraw` and `VaultClawback`. In the floating point calculations the division introduces a larger error than multiplication. If we do division first, then the error introduced will be increased by the multiplication that follows, which is therefore the wrong order to perform these two operations. This change flips the order of arithmetic operations, which minimizes the error.
Rather than having a single `XRPL_RETIRE` macro that applies to both feature and fix amendments, this change replaces it by new `XRPL_RETIRE_FIX` and `XRPL_RETIRE_FEATURE` macros that avoids confusion between whether to prefix the amendment name with `feature` or `fix`.
This updates the CI image hashes after following change: https://github.com/XRPLF/ci/pull/81. And, since we use latest Conan, we can have `conan.lock` with a newline at the end, and we don't need to exclude it from `pre-commit` hooks any longer.
This change fixes JSON parsing of negative `int` input in `STNumber` and `STAmount`. The conversion of JSON to `STNumber` or `STAmount` may trigger a condition where we negate smallest possible `int` value, which is undefined behaviour. We use a temporary storage as `int64_t` to avoid this bug. Note that this only affects RPC, because we do not parse JSON in the protocol layer, and hence no amendment is needed.
This change removes unused definitions from the CMake files, moves variable definitions from `XrplSanity` to `XrplSettings` where they better belong, and updates the minimum GCC and Clang versions to match what we actually minimally support.
This change unifies the build and test jobs into a single job, and adds `ctest` to coverage reporting.
The mechanics of coverage reporting is slightly complex and most of it is encapsulated in the `coverage` target. The status quo way of preparing coverage reports involves running a single target `cmake --build . --target coverage`, which does three things:
* Build the `rippled` binary (via target dependency)
* Prepare coverage reports:
* Run `./rippled -u` unit tests.
* Gather test output and build reports.
This makes it awkward to add an additional `ctest` step between build and coverage reporting steps. The better solution is to split `coverage` target into separate build, followed by `ctest`, followed by test generation. Luckily, the `coverage` target has been designed specifically to support such case; it does not need to build `rippled`, it's just a dependency. Similarly it allows additional tests to be run before gathering test outputs; in principle we could even strip it from running tests and run them separately instead. This means we can keep build, `ctest` and generation of coverage reports as separate steps, as long as the state of build directory is fully (including file timestamps, additional coverage files etc.) preserved between the steps. This means that in order to run `ctest` for coverage reporting we need to integrate build and test into a single job, which this change does.
As part of renaming ripple(d) to xrpl(d), the xrpld symlink was made to point to itself instead of to the rippled binary. This change fixes the symlink.
Per XLS-0095, we are taking steps to rename ripple(d) to xrpl(d).
This change updates the CMake files and definitions therein, plus a handful of related modifications. Specifically, the compiler files are renamed from `RippleXXX.cmake` or `RippledXXX.cmake` to `XrplXXX.cmake`, and any references to `ripple` and `rippled` (with or without capital letters) are renamed to `xrpl` and `xrpld`, respectively. The name of the binary, currently `rippled`, remains unchanged and will be updated in a separate PR. This change is purely cosmetic and does not affect the functioning of the binary.
Per XLS-0095, we are taking steps to rename ripple(d) to xrpl(d).
This change specifically removes all copyright notices referencing Ripple, XRPLF, and certain affiliated contributors upon mutual agreement, so the notice in the LICENSE.md file applies throughout. Copyright notices referencing external contributions remain as-is. Duplicate verbiage is also removed.
Per XLS-0095, we are taking steps to rename ripple(d) to xrpl(d).
C++ include guards are used to prevent the contents of a header file from being included multiple times in a single compilation unit. This change renames all `RIPPLE_` and `RIPPLED_` definitions, primarily include guards, to `XRPL_`. It also provides a script to allow developers to replicate the changes in their local branch or fork to avoid conflicts.
Amendments activated for more than 2 years can be retired, and obsolete retirements that were never activated can also be removed after 2 years. This change retires the NonFungibleTokensV1_1, fixNonFungibleTokensV1_2, and fixNFTokenRemint amendments, and removes the NonFungibleTokensV1, fixNFTokenNegOffer, and fixNFTokenDirV1 amendments.
To debug test failures we would like to use `netstat`, but that package wasn't installed yet in the CI images. This change uses the new CI images created by https://github.com/XRPLF/ci/pull/79.
This change:
* Simplifies the `TxMeta` constructors - both were setting the same set of fields, and to make it harder for future bugs to arise and keep the code DRY, we can combine those into one helper function.
* Removes an unused constructor.
* Renames the variables to avoid Hungarian naming.
* Removes a bunch of now-unnecessary helper functions.
This change introduces the `featurePermissionDelegationV1_1` amendment, which is designed to supersede both `featurePermissionDelegation` and `fixDelegateV1_1 amendments, which should be considered deprecated. The `checkPermission` function will now return `terNO_DELEGATE_PERMISSION` when a delegate transaction lacks the necessary permissions.
This change introduces the `fixDirectoryLimit` amendment to remove the directory pages limit. We found that the directory size limit is easier to hit than originally assumed, and there is no good reason to keep this limit, since the object reserve provides the necessary incentive to avoid creating unnecessary objects on the ledger.
Field `sfSubjectNode` is not populated by `CredentialCreate` in self-issued credentials. Rather than fixup the Credentials already on the ledger, we can in this case safely change the object template for this field from `soeREQUIRED` to `soeOPTIONAL`.
- Restructures `STTx` signature checking code to be able to handle
a `sigObject`, which may be the full transaction, or may be an object
field containing a separate signature. Either way, the `sigObject` can
be a single- or multi-sign signature.
- This is distinct from 550f90a75e (#5594), which changed the check in
Transactor, which validates whether a given account is allowed to sign
for the given transaction. This cryptographically checks the signature
validity.
This change adds an extra step to the CI test job that outputs network info, which may allow us to confirm whether random test failures are caused by port exhaustion.
We are on an amendment retiring spree, but each change results in conflicts in `features.macro` because currently they all add the retired amendment to the end of the list. By sorting the list the number of conflicts should be reduced, making it easier to merge them.
Amendments activated for more than 2 years can be retired. This change retires the fix1571 amendment.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
To protect the identity of UNL validators, the IP addresses are redacted from the log messages sent to the common Grafana instance. However, without such identifying information it is challenging to debug issues. This change adds a node's public key to logs to improve our ability to debug issues.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
* Retired fix1781 amendment
Signed-off-by: Pratik Mankawde <pmankawde@ripple.com>
* refactor: Retire fix1781 amendment
Amendments activated for more than 2 years can be retired. This change retires the fix1781 amendment.
---------
Signed-off-by: Pratik Mankawde <pmankawde@ripple.com>
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change reduces the number of cores used to build and test, as using all cores may be contributing to occasional build and test failures.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change changes the CI concurrency group for pushes to the `develop` branch to use the commit hash instead of the target branch.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
There are separate steps for logging into Conan and uploading packages. However, at the moment sometimes the login step is executed even though no packages will be uploaded. The condition for performing both steps should be the same.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This changes fixes an invariant error where the amount withdrawn is equal to the transaction fee.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change fixes the `account_tx` RPC method to properly validate malformed limit parameter values. Previously, invalid values like `0`, `1.2`, `"10"`, `true`, `false`, `-1`, `[]`, `{}`, etc. were either accepted without errors or caused internal errors. Now all malformed values correctly return the `invalidParams` error.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
Amendments activated for more than 2 years can be retired. This change retires the fix1543 amendment.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
The `-Wno-missing-template-arg-list-after-template-kw` flag is only needed for the grpc library. Use `+=` for the default build flags to make it easier to extend in the future.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
Amendments activated for more than 2 years can be retired. This change retires the fix1515 amendment.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
As XRPL network demand grows and ledger sizes increase, the default 4K NuDB block size becomes a performance bottleneck, especially on high-performance storage systems. Modern SSDs and enterprise storage often perform better with larger block sizes, but rippled previously had no way to configure this parameter. This change therefore implements configurable NuDB block size support, allowing operators to optimize storage performance based on their hardware configuration. The feature adds a new `nudb_block_size` configuration parameter that enables block sizes from 4K to 32K bytes, with comprehensive validation and backward compatibility.
Specific changes are:
- Implements `parseBlockSize()` function with validation.
- Adds `nudb_block_size` configuration parameter.
- Supports block sizes from 4K to 32K (power of 2).
- Adds comprehensive logging and error handling.
- Maintains backward compatibility with 4K default.
- Adds unit tests for block size validation.
- Updates configuration documentation with performance guidance.
- Marks feature as experimental.
- Applies code formatting fixes.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
Similarly to other transaction typed that can create a trust line or MPToken for the transaction submitter (e.g. CashCheck #5285, EscrowFinish #5185 ), VaultWithdraw should enforce reserve before creating a new object. Additionally, the lsfRequireDestTag account flag should be enforced for the transaction submitter.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
The default job timeout is 5 hours, while build times are anywhere between 4-20 mins and test times between 2-10. As a runner occasionally gets stuck, we should fail much quicker.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This PR sets the fail-fast strategy option to false (it defaults to true), unless it is run by a merge group.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change sanitizes inputs by setting them as environment variables, and adjusts the number of CPUs used for building. Namely, GitHub inputs should be sanitized, per recommendation by Semgrep, as using them directly poses a security risk. A recent change further overrode the global configuration by having builds use all cores, but as we have noticed an increased number of job cancelation this change updates it to use all cores less one.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
There are some tests that write out JSONs as a string instead of using the Json::Value library, which are cleaned up by this change.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
A regression was introduced in #5669 which would cause rippled to potentially dereference a disengaged std::optional when connecting to a peer. This would cause UB in release build and crash in debug.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change replaces boost::lexical_cast<std::string> with to_string in some of the tests to make them more readable.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change replaces instances of JSON LastLedgerSequence with last_ledger_seq, which makes the tests a bit simpler and easier to read.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
Windows is extremely chatty and generates tons of logs when building, making it practically impossible to use the build logs to debug issues. This change sets the verbosity to 'quiet' on Windows.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This PR changes fee().accountReserve(0) to fee().reserve, as the current network reserve amount should be used instead of the account reserve.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change updates the Docker image hashes of the tools-rippled images to fix a missing dependency.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change adds a paychan namespace to the TestHelpers and implementation files, improving organization and clarity. Additionally, it updates the AMM test to use the new `paychan::create` function for payment channel creation.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change excludes from Codecov unreachable/difficult-to-test transaction code (such as `tecINTERNAL`) and old code (from amendments that have been enabled for a long time that are only around for ledger replay reasons). This removes about 200 lines of misses and increases the Codecov coverage by 0.3% (79.2% to 79.5%).
This change adds a wildcard to the release branch in the CI pipeline spec. Namely, after adopting an improved release process, with release branches that now look like release-X.Y, the trigger pipeline was no longer running as it only searched for an exact match to release.
This change downgrades OpenSSL 3.6.0 to 3.5.4. To avoid potential zero-day issues in a new major version of OpenSSL, 3.6.0, it is safer to stick with 3.5.4. While 3.6.0 has some nice new features, such as improved SHA512 hashing, it also introduces new features that could contain bugs. In contrast, 3.5.4 has seen quite a few bug fixes over 3.5.0 and has been used in the wild for a while now.
This change fixes a release build error with GCC 15.2.
The `fields` variable is only used in `XRPL_ASSERT`, which evaluates to nothing in a Release build, leaving the variable unused. This change silences the build warning.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
This change uses the new RHEL 9 and 10 images to build and test the binary, and adds support for having different Docker image SHAs per distro-compiler combination.
Instead of supporting RHEL each minor version, we are simplifying our pipelines by only supporting RHEL major versions. Our CI Docker images have already been updated accordingly, and we recently added support for RHEL 10 as well. Up until now, the CI Docker images had all been rebuilt at the same time, but that is not necessarily true as the most recent push to the CI repo has shown where the RHEL images now have a different SHA than the Debian and Ubuntu ones.
Co-authored-by: Bart Thomee <11445373+bthomee@users.noreply.github.com>
* Restructures Transactor signature checking code to be able to handle a `sigObject`, which may be the full transaction, or may be an object field containing a separate signature. Either way, the `sigObject` can be a single- or multi-sign signature.
* Restructures `Transactor::preflight` to create several functions that will remove the need for error-prone boilerplate code in derived classes' implementations of `preflight`.
This change adds `STInt32` as a new `SType` under the `STInteger` umbrella, with `SType` value `12`. This is the first and only `STInteger` type that supports negative values.
This change adds more comprehensive tests for the `FeeVote` module, which previously only checked the basics, and not the more comprehensive flows in that class.
This change makes the regex in `HttpClient.cpp` that matches the content-length http header case insensitive to improve compatibility, as http headers are case insensitive.
This change raises logging severity from `INFO` to `WARN` when handling UNL manifest signed with an unexpected / invalid key. It also changes the internal error code for an invalid format of UNL manifest to `invalid` (from `untrusted`).
This is a follow up to problems experienced by an UNL node due to old manifest key configured in `validators.txt`, which would be easier to diagnose with improved logging.
It also replaces a log line with `UNREACHABLE` for an impossible situation when we match UNL manifest key against a configured key which has an invalid type (we cannot configure such a key because of checks when loading configured keys).
This adds a comment to avoid using `std::counting_semaphore` until the minimum compiler versions of GCC and Clang have been updated to no longer contain the bug that is present in older compilers.
This change reverts #5617, because it will require extensive testing that will take up more time than we have before the next scheduled release.
Reverting this change does not mean we are abandoning it. We aim to pick it back up once there's a sufficient time window to allow for testing on multiple distros running a mixture of OpenSSL 1.x and 3.x.
This change is to improve code coverage (and to simplify #5720 and #5725); there is otherwise no change in functionality. The change adds basic tests for `STInteger` and `STParsedJSON`, so it becomes easier to test smaller changes to the types, as well as removes `STParsedJSONArray`, since it is not used anywhere (including in Clio).
- Added a new Invariant: `ValidPseudoAccounts` which checks that all pseudo-accounts behave consistently through creation and updates, and that no "real" accounts look like pseudo-accounts (which means they don't have a 0 sequence).
- `to_short_string(base_uint)`. Like `to_string`, but only returns the first 8 characters. (Similar to how a git commit ID can be abbreviated.) Used as a wrapped sink to prefix most transaction-related messages. More can be added later.
- `XRPL_ASSERT_PARTS`. Convenience wrapper for `XRPL_ASSERT`, which takes the `function` and `description` as separate parameters.
- `SField::sMD_PseudoAccount`. Metadata option for `SField` definitions to indicate that the field, if set in an `AccountRoot` indicates that account is a pseudo-account. Removes the need for hard-coded field lists all over the place. Added the flag to `AMMID` and `VaultID`.
- Added functionality to `SField` ctor to detect both code and name collisions using asserts. And require all SFields to have a name
- Convenience type aliases `STLedgerEntry::const_pointer` and `STLedgerEntry::const_ref`. (`SLE` is an alias to `STLedgerEntry`.)
- Generalized `feeunit.h` (`TaggedFee`) into `unit.h` (`ValueUnit`) and added new "BIPS"-related tags for future use. Also refactored the type restrictions to use Concepts.
- Restructured `transactions.macro` to do two big things
1. Include the `#include` directives for transactor header files directly in the macro file. Removes the need to update `applySteps.cpp` and the resulting conflicts.
2. Added a `privileges` parameter to the `TRANSACTION` macro, which specifies some of the operations a transaction is allowed to do. These `privileges` are enforced by invariant checks. Again, removed the need to update scattered lists of transaction types in various checks.
- Unit tests:
1. Moved more helper functions into `TestHelpers.h` and `.cpp`.
2. Cleaned up the namespaces to prevent / mitigate random collisions and ambiguous symbols, particularly in unity builds.
3. Generalized `Env::balance` to add support for `MPTIssue` and `Asset`.
4. Added a set of helper classes to simplify `Env` transaction parameter classes: `JTxField`, `JTxFieldWrapper`, and a bunch of classes derived or aliased from it. For an example of how awesome it is, check the changes `src/test/jtx/escrow.h` for how much simpler the definitions are for `finish_time`, `cancel_time`, `condition`, and `fulfillment`.
5. Generalized several of the amount-related helper classes to understand `Asset`s.
6. `env.balance` for an MPT issuer will return a negative number (or 0) for consistency with IOUs.
This change re-enables building and testing all configurations, but only for the daily scheduled run. Previously all configurations were run for each merge into the develop branch, but that overwhelmed both the GitHub runners and the Conan remote, and thus they were limited to just a subset of configurations. Now that the number of jobs is limited via `max-parallel: 10`, we should be able to safely enable building all configurations again. However, building them all once a day instead of for each PR merge should be sufficient.
- Ensures the commits don't get orphaned, even though the relevant code
changes are already included.
* tag '2.5.1':
Set version to 2.5.1
Fix: Don't flag consensus as stalled prematurely (#5658)
GitHub runners have a limit on how many concurrent jobs they can actually process (even though they will try to run them all at the same time), and similarly the Conan remote cannot handle hundreds of concurrent requests. Previously, the Conan dependency uploading was already limited to max 10 jobs running in parallel, and this change makes the same change to the build+test workflow.
This change adds a fix amendment (`fixIncludeKeyletFields`) that adds:
* `sfSequence` to `Escrow` and `PayChannel`
* `sfOwner` to `SignerList`
* `sfOracleDocumentID` to `Oracle`
This ensures that all ledger entries hold all the information needed to determine their keylet.
The XRPL establishes connections in three stages: first a TCP connection, then a TLS/SSL handshake to secure the connection, and finally an upgrade to the bespoke XRP Ledger peer-to-peer protocol. During connection termination, xrpld directly closes the TCP connection, bypassing the TLS/SSL shutdown handshake. This makes peer disconnection diagnostics more difficult - abrupt TCP termination appears as if the peer crashed rather than disconnected gracefully.
This change refactors the connection lifecycle with the following changes:
- Enhanced outgoing connection logic with granular timeouts for each connection stage (TCP, TLS, XRPL handshake) to improve diagnostic capabilities
- Updated both PeerImp and ConnectAttempt to use proper asynchronous TLS shutdown procedures for graceful connection termination
* extends the functionality of the MPTokenIssuanceSet transaction, allowing the issuer to update fields or flags that were explicitly marked as mutable during creation.
Clio should only be notified when releases are about to be made, instead of for all PR, so this change only notifies Clio when a PR targets the release or master branch.
This change wraps all GitHub conditionals in `${{ .. }}`, both for consistency and to reduce unexpected failures, because it was previously noticed that not all conditionals work without those curly braces.
- Amendment: fixDelegateV1_1
- In DelegateSet, disallow invalid PermissionValues like 0, and transaction values when the transaction's amendment is not enabled. Acts as if the transaction doesn't exist, which is the same thing older versions without the amendment will do.
- Payment burn/mint should disallow DEX currency exchange.
- Support MPT for Payment burn/mint.
- Don't run upload-conan-deps in PRs, unless the PR changes the workflow file.
- Change cron schedule for uploading Conan dependencies to run after work hours for most dev.
- This should prevent Artifactory from being overloaded by too many requests at a time.
- Uses "max-parallel" to limit the build job to 10 simultaneous instances.
- Only run the minimal matrix on PRs.
For the purposes of being able to merge a PR, Github Actions jobs count as passed if they ran and passed, or were skipped.
With this change, if any of the jobs that "passed" depends on fail or are cancelled, then "passed" will fail. If they all succeed or are skipped, then "passed" is skipped, which does not prevent a merge.
This saves spinning up a runner in the usual case where things work, and will simplify our branch protection rules, so that only "passed" will need to be checked.
* Add and Scale to VaultCreate
* Add round-trip calculation to VaultDeposit VaultWithdraw and VaultClawback
* Implement Number::truncate() for VaultClawback
* Add rounding to DepositWithdraw
* Disallow zero shares withdraw or deposit with tecPRECISION_LOSS
* Return tecPATH_DRY on overflow when converting shares/assets
* Remove empty shares MPToken in clawback or withdraw (except for vault owner)
* Implicitly create shares MPToken for vault owner in VaultCreate
* Review feedback: defensive checks in shares/assets calculations
---------
Co-authored-by: Ed Hennis <ed@ripple.com>
Fix stalled consensus detection to prevent false positives in situations where there are no disputed transactions.
Stalled consensus detection was added to 2.5.0 in response to a network consensus halt that caused a round to run for over an hour. However, it has a flaw that makes it very easy to have false positives. Those false positives are usually mitigated by other checks that prevent them from having an effect, but there have been several instances of validators "running ahead" because there are circumstances where the other checks are "successful", allowing the stall state to be checked.
* chore: Use conan lockfile
* Add windows-specific dependencies as well
* Add more info about lockfiles
* Update lockfile to latest version
* Update BUILD.md with conan install note
This is a major refactor of LedgerEntry.cpp. It adds a number of helper functions to make the code easier to maintain.
It also splits up the ledger and ledger_entry tests into different files, and cleans up the ledger_entry tests to make them easier to write and maintain.
This refactor also caught a few bugs in some of the other RPC processing, so those are fixed along the way.
This is a follow-up to PR #5664 that further improves the specificity of logging for refused peer connections. The previous changes did not account for several key scenarios, leading to potentially misleading log messages.
It addresses the following
- Inbound Disabled: Connections are now explicitly logged as rejected when the server is not configured to accept inbound peers. Previously, this was logged as the server being "full," which was technically correct but lacked diagnostic clarity.
- Duplicate Connections: The logging now distinguishes between two types of duplicate connection refusals:
- When a peer with the same node public key is already connected (duplicate connection).
- When a connection is rejected because the limit for connections from a single IP address has been reached.
These changes provide more accurate and actionable diagnostic information when analyzing peer connection behavior.
Test jobs will run if
* Either the PR is non-draft or has the "DraftRunCI" label set *AND*
* One of the following:
* Certain files were changed *OR*
* The PR is non-draft and has the "Ready to merge" flag *OR*
* The workflow is being run from the merge queue.
Additionally, a meta "passed" job was added that is dependent on all the other test jobs, so the required jobs list under branch protection rules only needs to specify "passed" to ensure that *either* all the test jobs pass *or* all the test jobs are skipped because they don't need to be run.
This allows PRs that don't affect the build or binary to be merged without overriding.
This updates Boost to 1.88, which is needed because Clio wants to move to 1.88 as that fixes several ASAN false positives around coroutine usage. In order for Clio to move to newer boost, libXRPL needs to move too. Hence the changes in this PR. A lot has changed between 1.83 and 1.88 so there are lots of changes in the diff, especially in regards to Boost.Asio and coroutines in particular.
This change removes `labeled` and `unlabeled` as pipeline trigger actions, and instead adds `reopened` and `ready_for_review`. The logic whether to run the pipeline jobs is then simplified, although to get a draft PR with the `DraftCIRun` label to run it can be necessary to close and reopen a PR.
The change updates how clang-format is called in CI and locally, and adds prettier to the pre-commit hook. Proto files are now also formatted, while external files are excluded.
This change will skip running the notify-clio job when a PR is created from a fork, and reorders the strategy matrix configuration fields so GitHub will more clearly show which configuration is running.
This change reverts the formatting applied to external files and adds formatting of proto files.
As clang-format will complain if a proto file is modified or moved, since the .clang-format file does not explicitly contain a section for proto files, the change has been included in this PR as well.
This change updates OpenSSL from 1.1.1w to 3.5.2. The code works as-is, but many functions have been marked as deprecated and thus will need to be rewritten. For now we explicitly add the `-DOPENSSL_SUPPRESS_DEPRECATED` to give us time to do so, while providing us with the benefits of the updated version.
This change modifies the `build_only` check used to determine whether to run tests. For easier debugging in the future it also prints out the contents of the strategy matrix.
Reduce log noise by changing two log statements from error/warn level to debug level. These logs occur during normal operation when AMM offers are not available or when IOU authorization checks fail, which are expected scenarios that don't require an elevated log level.
Currently, all peer connection rejections are logged with the reason "slots full". This is inaccurate, as the PeerFinder can also reject connections if they are a duplicate. This change updates the logging logic to correctly report the specific reason (full or duplicate) for a rejected peer connection, providing more accurate diagnostic information.
This change fixes the suite names all around the test files, to make them match to the folder name in which this test files are located. Also, the RCL test files are relocated to the consensus folder, because they are testing consensus functionality.
This change replaces the configuration variable with the hardcoded `https://conan.ripplex.io`, making it possible for PRs from forks to use our Conan remote containing workarounds.
We're currently calling `XXH3_createState` and `XXH3_freeState` when hashing an object. However, it may be slow because they call `malloc` and `free`, which may affect the performance. This change avoids the use of the streaming API as much as possible by using an internal buffer.
Fix stalled consensus detection to prevent false positives in situations where there are no disputed transactions.
Stalled consensus detection was added to 2.5.0 in response to a network consensus halt that caused a round to run for over an hour. However, it has a flaw that makes it very easy to have false positives. Those false positives are usually mitigated by other checks that prevent them from having an effect, but there have been several instances of validators "running ahead" because there are circumstances where the other checks are "successful", allowing the stall state to be checked.
This change updates some incorrect Conan commands for Conan 2. As some flags do not exist in Conan 2, such as --settings build_type=[configuration], the commands have been adjusted accordingly. This change further uses the org-level variables and secrets rather than the repo-level ones.
This change introduces two key optimizations:
* Mutex scope reduction: Limits the lock to individual partitions within `TaggedCache`, reducing contention.
* Decoupling: Removes the tight coupling between `LedgerHistory` and `TaggedCache`, improving modularity and testability.
Lock contention analysis based on eBPF showed significant improvements as a result of this change.
This change uploads built Conan dependencies to the Conan remote upon merge into the develop branch.
At the moment, whenever Conan dependencies change, we need to remember to manually push them to our Conan remote, so they are cached for future reuse. If we forget to do so, these changed dependencies need to be rebuilt over and over again, which can take a long time.
This change:
* Removes the patched Conan recipes from the `external/` directory.
* Adds instructions for contributors how to obtain our patched recipes.
* Updates the Conan remote name and remote URL (the underlying package repository isn't changed).
* If the remote already exists, updates the URL instead of removing and re-adding.
* This is not done for the libXRPL job as it still uses Conan 1. This job will be switched to Conan 2 soon.
* Removes duplicate Conan remote CI pipeline steps.
* Overwrites the existing global.conf on MacOS and Windows machines, as those do not run CI pipelines in isolation but all share the same Conan installation; appending the same config over and over bloats the file.
This change updates BUILD.md for Conan 2, add fixes/workarounds for Apple Clang 17, Clang 20 and CMake 4. This also removes (from BUILD.md only) workarounds for compiler versions which we no longer support e.g. Clang 15 and adds compilation flag -Wno-deprecated-declarations to enable building with Clang 20 on Linux.
This change fixes an issue where the order of `PriceDataSeries` was out of sync between when `PriceOracle` was created and when it was updated. Although they are registered in the canonical order when updated, they are created using the order specified in the transaction; this change ensures that they are also registered in the canonical order when created.
This change decouples `ledger` from `xrpld/app`, and therefore fully clears the path to the modularisation of the ledger component. Before this change, `View.cpp` relied on `MPTokenAuthorize::authorize; this change moves `MPTokenAuthorize::authorize` to `View.cpp` to invert the dependency, making ledger a standalone module.
The Payment transaction metadata is missing the `DeliveredAmount` field that displays the actual amount delivered to the destination excluding transfer fees. This amendment fixes this problem.
#5224 added (among other things) a `VaultWithdraw` transaction that allows setting the recipient of the withdrawn funds in the `Destination` transaction field. This technically turns this transaction into a payment, and in some respect the implementation does follow payment rules, e.g. enforcement of `lsfRequireDestTag` or `lsfDepositAuth`, or that MPT transfer has destination `MPToken`. However for IOUs, it missed verification that the destination account has a trust line to the asset issuer. Since the default behavior of `accountSendIOU` is to create this trust line (if missing), this is what `VaultWithdraw` currently does. This is incorrect, since the `Destination` might not be interested in holding the asset in question; this basically enables spammy transfers. This change, therefore, removes automatic creation of a trust line to the `Destination` account in `VaultWithdraw`.
This change updates RocksDB to its latest version. RocksDB is backward-compatible, so even though this is a major version bump, databases created with previous versions will continue to function.
The external RocksDB folder is removed, as the latest version available via Conan Center no longer needs custom patches.
Before `XRPLF/ci` images, we did not have a `dependencies:` job for clang-16, so `instrumentation:` had to build its own dependencies. Now we have clang-16 Conan dependencies built in a separate job that can be used.
This change includes `network_id` data in the validations and ledger subscription stream responses, as well as unit tests to validate the response fields. Fixes#4783
This change adds support for `DomainID` to existing transactions `MPTokenIssuanceCreate` and `MPTokenIssuanceSet`.
In #5224 `DomainID` was added as an access control mechanism for `SingleAssetVault`. The actual implementation of this feature lies in `MPToken` and `MPTokenIssuance`, hence it makes sense to enable the use of `DomainID` also in `MPTokenIssuanceCreate` and `MPTokenIssuanceSet`, following same rules as in Vault:
* `MPTokenIssuanceCreate` and `MPTokenIssuanceSet` can only set `DomainID` if flag `MPTRequireAuth` is set.
* `MPTokenIssuanceCreate` requires that `DomainID` be a non-zero, uint256 number.
* `MPTokenIssuanceSet` allows `DomainID` to be zero (or empty) in which case it will remove `DomainID` from the `MPTokenIssuance` object.
The change is amendment-gated by `SingleAssetVault`. This is a non-breaking change because `SingleAssetVault` amendment is `Supported::no`, i.e. at this moment considered a work in progress, which cannot be enabled on the network.
For jobs running in containers, $GITHUB_WORKSPACE and ${{ github.workspace }} might not be the same directory. The actions/checkout step is supposed to checkout into `$GITHUB_WORKSPACE` and then add it to safe.directory (see instructions at https://github.com/actions/checkout), but that's apparently not happening for some container images. We can't be sure what is actually happening, so we preemptively add both directories to `safe.directory`. See also the GitHub issue opened in 2022 that still has not been resolved https://github.com/actions/runner/issues/2058.
The current implementation of rngfill is prone to false warnings from GCC about array bounds violations. Looking at the code, the implementation naively manipulates both the bytes count and the buffer pointer directly to ensure the trailing memcpy doesn't overrun the buffer. As expressed, there is a data dependency on both fields between loop iterations.
Now, ideally, an optimizing compiler would realize that these dependencies were unnecessary and end up restructuring its intermediate representation into a functionally equivalent form with them absent. However, the point at which this occurs may be disjoint from when warning analyses are performed, potentially rendering them more difficult to
determine precisely.
In addition, it may also consume a portion of the budget the optimizer has allocated to attempting to improve a translation unit's performance. Given this is a function template which requires context-sensitive instantiation, this code would be more prone than most to being inlined, with a decrease in optimization budget corresponding to the effort the optimizer has already expended, having already optimized one or more calling functions. Thus, the scope for impacting the the ultimate quality of the code generated is elevated.
For this change, we rearrange things so that the location and contents of each memcpy can be computed independently, relying on a simple loop iteration counter as the only changing input between iterations.
Remove `include(default)` from `conan/profiles/libxrpl`. This means that we will now rely on compiler workarounds stored elsewhere e.g. in global.conf.
This change reverts the usage of boost::shared_mutex back to std::shared_mutex. The change was originally introduced as a workaround for a bug in glibc 2.28 and older versions, which could cause threads using std::shared_mutex to stall. This issue primarily affected Ubuntu 18.04 and earlier distributions, which we no longer support.
This change fixes the MacOS pipeline issue by limiting GitHub to choose the existing runners, ensuring the new experimental runners are excluded until they are ready.
This issue was reported on the Javascript client library: XRPLF/xrpl.js#2611
The type filter (Note: as of the latest version of rippled, type parameter is deprecated) does not work as expected. This PR removes the type filter from the ledger command.
This PR updates several dependencies to their latest versions. Not all dependencies have been updated, as some need to be patched and some require additional code changes due to backward incompatibilities introduced by the version bump.
Due to rounding, the LPTokenBalance of the last LP might not match the LP's trustline balance. This was fixed for `AMMWithdraw` in `fixAMMv1_1` by adjusting the LPTokenBalance to be the same as the trustline balance. Since `AMMClawback` is also performing a withdrawal, we need to adjust LPTokenBalance as well in `AMMClawback.`
This change includes:
1. Refactored `verifyAndAdjustLPTokenBalance` function in `AMMUtils`, which both`AMMWithdraw` and `AMMClawback` call to adjust LPTokenBalance.
2. Added the unit test `testLastHolderLPTokenBalance` to test the scenario.
3. Modify the existing unit tests for `fixAMMClawbackRounding`.
Currently there is no easy way to track MPT related transactions for the issuer. This change allows MPT transactions to show up on issuer's AccountTx RPC (to align with how IOUs work).
* Update the `account_info` API so that the `allowTrustLineLocking` flag is included in the response.
* The proposed `TokenEscrow` amendment added an `allowTrustLineLocking` flag in the `AccountRoot` object.
* In the API response, under `account_flags`, there is now an `allowTrustLineLocking` field with a boolean (`true` or `false`) value.
* For reference, the XLS-85 Token-Enabled Escrows implementation can be found in https://github.com/XRPLF/rippled/pull/5185
The current version was copied from `antithesis-sdk-cpp` but there is no logical reason to require this specific version of CMake. This change downgrades the version to make the project build with older CMake versions.
Having `boost::boost` in `self.requires` makes clio link with all boost libraries. There are additionally several Boost stacktrace backends that are both linked with, which violate ODR.
This change fixes the problem.
This PR refactors `CredentialHelpers` and removes some unnecessary dependencies as a step of modularization.
The ledger component is almost independent except that it references `MPTokenAuthorize` and `CredentialHelpers.h`, and the latter further references `Transactor.h`. This PR partially clears the path to modularizing the ledger component and decouples `CredentialHelpers` from xrpld.
This PR fixes a crash in tests when the test `Env is run at trace/debug log level.
This issue only affects tests, and only if logging at trace/debug level, so really only relevant during rippled development, and does not affect production servers.
The tests that ensure `tfInnerBatchTxn` won't block delegated transactions silently fail in `Delegate_test.cpp`. This change removes these cases from that file and adds them to `Batch_test.cpp` instead where they do not silently fail, because there the batch delegate results are explicitly checked. Moving them to that file further avoids refactoring many helper functions.
This change allows users to submit simulate requests from a multi-sign account without needing to specify the accounts that are doing the multi-signing, and fixes an error with simulate that allowed double-"signed" (both single-sign and multi-sign public keys are provided) transactions.
Multi-line log messages are hard to work with. Writing these handful of related messages as one message should make the log a tiny bit easier to manage.
The CMake statements that make it seem as if the number of cores used to build external project dependencies is halved don't actually do anything. This change removes these statements.
* Adds `tecNO_DELEGATE_PERMISSION` for unauthorized transactions sent by a delegated account.
* Returns `tecNO_TARGET` instead of `terNO_ACCOUNT` for the `DelegateSet` transaction if the delegated account does not exist.
* Fixes `tfFullyCanonicalSig` and `tfInnerBatchTxn` blocking transactions issue by adding `tfUniversal` in the permission related masks in `txFlags.h`
The change increases the default network I/O worker thread pool size from 2 to 6. This will improve stability, as worker thread saturation correlates to desyncs, particularly on high-traffic peers, such as hubs.
To be able to consume `rippled` in Conan 2, the recipe should specify transitive_headers for external libraries that are present in the exported header files. This change remains compatibility with Conan 1, where this flag was not present.
- Specification: https://github.com/XRPLF/XRPL-Standards/pull/272
- Amendment: `TokenEscrow`
- Enables escrowing of IOU and MPT tokens in addition to native XRP.
- Allows accounts to lock issued tokens (IOU/MPT) in escrow objects, with support for freeze, authorization, and transfer rates.
- Adds new ledger fields (`sfLockedAmount`, `sfIssuerNode`, etc.) to track locked balances for IOU and MPT escrows.
- Updates EscrowCreate, EscrowFinish, and EscrowCancel transaction logic to support IOU and MPT assets, including proper handling of trustlines and MPT authorization, transfer rates, and locked balances.
- Enforces invariant checks for escrowed IOU/MPT amounts.
- Extends GatewayBalances RPC to report locked (escrowed) balances.
The changes are focused on fixing NFT transactions bypassing the trustline authorization requirement and potential invariant violation when interacting with deep frozen trustlines.
* Add AMM bid/create/deposit/swap/withdraw/vote invariants:
- Deposit, Withdrawal invariants: `sqrt(asset1Balance * asset2Balance) >= LPTokens`.
- Bid: `sqrt(asset1Balance * asset2Balance) > LPTokens` and the pool balances don't change.
- Create: `sqrt(asset1Balance * assetBalance2) == LPTokens`.
- Swap: `asset1BalanceAfter * asset2BalanceAfter >= asset1BalanceBefore * asset2BalanceBefore`
and `LPTokens` don't change.
- Vote: `LPTokens` and pool balances don't change.
- All AMM and swap transactions: amounts and tokens are greater than zero, except on withdrawal if all tokens
are withdrawn.
* Add AMM deposit and withdraw rounding to ensure AMM invariant:
- On deposit, tokens out are rounded downward and deposit amount is rounded upward.
- On withdrawal, tokens in are rounded upward and withdrawal amount is rounded downward.
* Add Order Book Offer invariant to verify consumed amounts. Consumed amounts are less than the offer.
* Fix Bid validation. `AuthAccount` can't have duplicate accounts or the submitter account.
This commit changes the ledger close in env.meta to be conditional on if it hasn't already been closed (i.e. the current ledger doesn't have any transactions in it). This change will make it a bit easier to use, as it will still work if you close the ledger outside of this usage. Previously, if you accidentally closed the ledger outside of the meta function, it would segfault and it was incredibly difficult to debug.
This commit introduces the following changes:
* Renames `vp_enable config` option to `vp_base_squelch_enable` to enable squelching for validators.
* Removes `vp_squelch` config option which was used to configure whether to send squelch messages to peers or not. With this flag removed, if squelching is enabled, squelch messages will be sent. This was an option used for debugging.
* Introduces a temporary `vp_base_squelch_max_trusted_peers` config option to change the max number of peers who are selected as validator message sources. This is a temporary option, which will be removed once a good value is found.
* Adds a traffic counter to count the number of times peers ignored squelch messages and kept sending messages for squelched validators.
* Moves the decision whether squelching is enabled and ready into Slot.h.
- Specification: [XRPLF/XRPL-Standards 56](https://github.com/XRPLF/XRPL-Standards/blob/master/XLS-0056d-batch/README.md)
- Amendment: `Batch`
- Implements execution of multiple transactions within a single batch transaction with four execution modes: `tfAllOrNothing`, `tfOnlyOne`, `tfUntilFailure`, and `tfIndependent`.
- Enables atomic multi-party transactions where multiple accounts can participate in a single batch, with up to 8 inner transactions and 8 batch signers per batch transaction.
- Inner transactions use `tfInnerBatchTxn` flag with zero fees, no signature, and empty signing public key.
- Inner transactions are applied after the outer batch succeeds via the `applyBatchTransactions` function in apply.cpp.
- Network layer prevents relay of transactions with `tfInnerBatchTxn` flag - each peer applies inner transactions locally from the batch.
- Batch transactions are excluded from AccountDelegate permissions but inner transactions retain full delegation support.
- Metadata includes `ParentBatchID` linking inner transactions to their containing batch for traceability and auditing.
- Extended STTx with batch-specific signature verification methods and added protocol structures (`sfRawTransactions`, `sfBatchSigners`).
Before #5224, the pseudoaccount ID was calculated using prefix expressed in `std::uint16_t`. The refactoring to move the pseudoaccount ID calculation to View.cpp had accidentally changed the prefix type to `int` (derived from `auto i = 0`) which in turn changed the length of the input to `sha512Half` from 2 bytes to 4, altering the result.
This resulted in a different ID of the pseudoaccount calculated from the function after the refactoring, breaking the ledger. This impacts AMMCreate, even when the `SingleAssetVault` amendment is not active. This change restores the prefix type to `std::uint16_t`.
- Specification: XRPLF/XRPL-Standards#239
- Amendment: `SingleAssetVault`
- Implements a vault feature used to store a fungible asset (XRP, IOU, or MPT, but not NFT) and to receive shares in the vault (an MPT) in exchange.
- A vault can be private or public.
- A private vault can use permissioned domains, subject to the `PermissionedDomains` amendment.
- Shares can be exchanged back into asset with `VaultWithdraw`.
- Permissions on the asset in the vault are transitively applied on shares in the vault.
- Issuer of the asset in the vault can clawback with `VaultClawback`.
- Extended `MPTokenIssuance` with `DomainID`, used by the permissioned domain on the vault shares.
Co-authored-by: John Freeman <jfreeman08@gmail.com>
Using std::barrier performs extremely poorly (~1 hour vs ~1 minute to run the test suite) in certain macOS environments.
To unblock our macOS CI pipeline, std::barrier has been replaced with a custom mutex-based barrier (Barrier) that significantly improves performance without compromising correctness.
Running unit tests in parallel and multiple threads can write into one file can corrupt output files, and then gcovr won't be able to parse the corrupted file. This change adds -fprofile-update=atomic as instructed by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68080.
This change implements the account permission delegation described in XLS-75d, see https://github.com/XRPLF/XRPL-Standards/pull/257.
* Introduces transaction-level and granular permissions that can be delegated to other accounts.
* Adds `DelegateSet` transaction to grant specified permissions to another account.
* Adds `ltDelegate` ledger object to maintain the permission list for delegating/delegated account pair.
* Adds an optional `Delegate` field in common fields, allowing a delegated account to send transactions on behalf of the delegating account within the granted permission scope. The `Account` field remains the delegating account; the `Delegate` field specifies the delegated account. The transaction is signed by the delegated account.
This change updates the squelching logic to accept squelch messages for untrusted validators. As a result, servers will also squelch untrusted validator messages reducing duplicate traffic they generate.
In particular:
* Updates squelch message handling logic to squelch messages for all validators, not only trusted ones.
* Updates the logic to send squelch messages to peers that don't squelch themselves
* Increases the threshold for the number of messages that a peer has to deliver to consider it as a candidate for validator messages.
Combines four related changes:
1. "Decrease `shouldRelay` limit to 30s." Pretty self-explanatory. Currently, the limit is 5 minutes, by which point the `HashRouter` entry could have expired, making this transaction look brand new (and thus causing it to be relayed back to peers which have sent it to us recently).
2. "Give a transaction more chances to be retried." Will put a transaction into `LedgerMaster`'s held transactions if the transaction gets a `ter`, `tel`, or `tef` result. Old behavior was just `ter`.
* Additionally, to prevent a transaction from being repeatedly held indefinitely, it must meet some extra conditions. (Documented in a comment in the code.)
3. "Pop all transactions with sequential sequences, or tickets." When a transaction is processed successfully, currently, one held transaction for the same account (if any) will be popped out of the held transactions list, and queued up for the next transaction batch. This change pops all transactions for the account, but only if they have sequential sequences (for non-ticket transactions) or use a ticket. This issue was identified from interactions with @mtrippled's #4504, which was merged, but unfortunately reverted later by #4852. When the batches were spaced out, it could potentially take a very long time for a large number of held transactions for an account to get processed through. However, whether batched or not, this change will help get held transactions cleared out, particularly if a missing earlier transaction is what held them up.
4. "Process held transactions through existing NetworkOPs batching." In the current processing, at the end of each consensus round, all held transactions are directly applied to the open ledger, then the held list is reset. This bypasses all of the logic in `NetworkOPs::apply` which, among other things, broadcasts successful transactions to peers. This means that the transaction may not get broadcast to peers for a really long time (5 minutes in the current implementation, or 30 seconds with this first commit). If the node is a bottleneck (either due to network configuration, or because the transaction was submitted locally), the transaction may not be seen by any other nodes or validators before it expires or causes other problems.
This change addresses an issue where `rippled` attempts to connect to an IPv6 address, even when the local network lacks IPv6 support, resulting in a "Network is unreachable" error.
The fix replaces the custom endpoint selection logic with `boost::async_connect`, which sequentially attempts to connect to available endpoints until one succeeds or all fail.
This PR replaces the word `failed` with `failure` in any test names and renames some test files to fix MSVC warnings, so that it is easier to search through the test output to find tests that failed.
This change fixes a number of issues involved with CTID:
* CTID is not present on all RPC tx transactions.
* rpcWRONG_NETWORK is missing in the ErrorCodes.cpp
When using subscribe at admin RPC port to send webhooks for the transaction stream to a backend, on large(r) ledgers the endpoint receives fewer HTTP POSTs with TX information than the amount of transactions in a ledger. This change removes the hardcoded queue length to avoid dropping TX notifications for the admin-only command. In addition, the per-request TTL for outgoing RPC HTTP calls has been reduced from 10 minutes to 30 seconds.
This change introduces a new fix amendment (`fixPayChanV1`) that prevents the creation of new `PaymentChannelCreate` transaction with a `CancelAfter` time less than the current ledger time. It piggy backs off of fix1571.
Once the amendment is activated, creating a new `PaymentChannel` will require that if you specify the `CancelAfter` time/value, that value must be greater than or equal to the current ledger time.
Currently users can create a payment channel where the `CancelAfter` time is before the current ledger time. This results in the payment channel being immediately closed on the next PaymentChannel transaction.
This PR splits out `ledger_entry` tests into its own file (`LedgerEntry_test.cpp`) and alphabetizes the helper functions in `LedgerEntry.cpp`. These commits were split out of #5237 to make that PR a little more manageable, since these basic trivial changes are most of the diff. There is no code change, just moving code around.
Adds metric counters for the following P2P message types:
* Untrusted proposal and validation messages
* Duplicate proposal, validation and transaction messages
It’s possible for this to happen legitimately if a set of peers, including a validator, are connected in a cycle, and the latency and message processing time between those peers is significantly less than the latency between the validator and the last peer. It’s unlikely in the real world, but obviously easy to simulate with Antithesis.
The ci pipelines are constantly hitting Docker Hub's public rate limiting since increasing the number of jobs we're running. This change switches over to images hosted in GitHub's registry.
As part of import optimization, a transitive include had been removed that defined `BOOST_COMP_MSVC` on Windows. In unity builds, this definition was pulled in, but in non-unity builds it was not - causing a compilation error. An inspection of the Boost code revealed that we can just gate the statements by `_MS_VER` instead. A `#pragma message` is added to verify that the statement is only printed on Windows builds.
The main goal of this optimisation is memory reduction in SHAMapTreeNodes by introducing intrusive pointers instead of standard std::shared_ptr and std::weak_ptr.
In preparation for a potential reference fee change we would like to verify that fee change works as expected. The first step is to fix all unit tests to be able to work with different reference fee values.
- Detects if the consensus process is "stalled". If it is, then we can declare a
consensus and end successfully even if we do not have 80% agreement on
our proposal.
- "Stalled" is defined as:
- We have a close time consensus
- Each disputed transaction is individually stalled:
- It has been in the final "stuck" 95% requirement for at least 2
(avMIN_ROUNDS) "inner rounds" of phaseEstablish,
- and either all of the other trusted proposers or this validator, if proposing,
have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
rounds", and at least 80% of the validators (including this one, if
appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
consensus establish phase's time, then consensus is considered "expired",
and we will leave the round, which sends a partial validation (indicating
that the node is moving on without validating). Two restrictions avoid
prematurely exiting, or having an extended exit in extreme situations.
- The 10x time is clamped to be within a range of 15s
(ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
- If consensus has not had an opportunity to walk through all avalanche
states (defined as not going through 8 "inner rounds" of phaseEstablish),
then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
fallen behind, and move on, too, generally before hitting the timeout. Any
validations or partial validations sent during this time will help the
consensus process bring the nodes back together.
This change removes the existing undefined behavior from `LogicError`, so we can be certain that there will be always a stacktrace.
De-referencing a null pointer is an old trick to generate `SIGSEGV`, which would typically also create a stacktrace. However it is also an undefined behaviour and compilers can do something else. A more robust way to create a stacktrace while crashing the program is to use `std::abort`, which we have also used in this location for a long time. If we combine the two, we might not get the expected behaviour - namely, the nullpointer deref followed by `std::abort`, as handled in certain compiler versions may not immediately cause a crash. We have observed stacktrace being wiped instead, and thread put in indeterminate state, then stacktrace created without any useful information.
The Trustline RPC `no_ripple` flag gets set depending on `lsfDefaultRipple` flag, which is not a flag of a trustline but of the account root. The `lsfDefaultRipple` flag does not provide any insight if this particular trust line has `lsfLowNoRipple` or `lsfHighNoRipple` flag set, so it should not be used here at all. This change simplifies the logic.
The `end_marker` is used to limit the range of ledger entries to fetch. If `end_marker` is less than `marker`, a crash can occur. This change adds an additional check.
Changes the error to `malformedAddress` for `permissioned_domain` in the `ledger_entry` rpc, when the account is not a string. This change makes it more clear to a user what is wrong with their request.
What the LoadManager class does is stall detection, which is not the same as deadlock detection. In the condition of severe CPU starvation, LoadManager will currently intentionally crash rippled reporting `LogicError: Deadlock detected`. This error message is misleading as the condition being detected is not a deadlock. This change fixes and refactors the code in response.
Requiring manual updates of numFeatures is an annoying manual process that is easily forgotten, and leads to frequent merge conflicts. This change takes advantage of the `XRPL_FEATURE` and `XRPL_FIX` macros, and adds a new `XRPL_RETIRE` macro to automatically set `numFeatures`.
The codebase is filled with includes that are unused, and which thus can be removed. At the same time, the files often do not include all headers that contain the definitions used in those files. This change uses clang-format and clang-tidy to clean up the includes, with minor manual intervention to ensure the code compiles on all platforms.
- PR #5228 added assert=TRUE and werr=TRUE CMake flags to the
build/action.yml script which is used by all CI jobs to build rippled,
ensuring those flags were always set. The assumption was that only the
CI jobs used that script, so any extra time cost was offset by the
benefit of the extra checks. That assumption was incorrect. That
script is used by other downstream projects. Therefore, those flags
have been moved into the individual CI jobs' "cmake-args" parameter
passed to build/action.yml. This will have the same effect for CI jobs
without any side effects.
Combine multiple related debug log data points into a single
message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn
across multiple lines, making correlation difficult.
The Heartbeat Timer and consensus ledger accept processing
each have this capability.
Also guarantees that log entries will be written if the
node is a validator, regardless of log severity level.
Otherwise, the level of these messages is at INFO severity.
* Add logging for amendment voting decision process
* When counting "received validations" to determine quorum, count the number of validators actually voting, not the total number of possible votes.
The current comment in the example cfg file incorrectly mentions both "may" and "must". This change fixes this comment to clarify that the default port of hosts is 2459 and that specifying it is therefore optional. It further sets the default port to 2459 instead of the legacy 51235.
- Drop duplicate outgoing TMGetLedger messages per peer
- Allow a retry after 30s in case of peer or network congestion.
- Addresses RIPD-1870
- (Changes levelization. That is not desirable, and will need to be fixed.)
- Drop duplicate incoming TMGetLedger messages per peer
- Allow a retry after 15s in case of peer or network congestion.
- The requestCookie is ignored when computing the hash, thus increasing
the chances of detecting duplicate messages.
- With duplicate messages, keep track of the different requestCookies
(or lack of cookie). When work is finally done for a given request,
send the response to all the peers that are waiting on the request,
sending one message per peer, including all the cookies and
a "directResponse" flag indicating the data is intended for the
sender, too.
- Addresses RIPD-1871
- Drop duplicate incoming TMLedgerData messages
- Addresses RIPD-1869
- Improve logging related to ledger acquisition
- Class "CanProcess" to keep track of processing of distinct items
---------
Co-authored-by: Valentin Balaschenko <13349202+vlntb@users.noreply.github.com>
This change enhances the filtering in the ledger, ledger_data, and account_objects methods by also supporting filtering by the canonical name of the LedgerEntryType using case-insensitive matching.
Rewrites the code so that the lock is not held during the callback. Instead it locks twice, once before, and once after. This is safe due to the structure of the code, but is checked after the second lock. This allows mutex_ to be changed back to a regular mutex.
In PeerImpl.cpp, if the function is a message handler (onMessage) or called directly from a message handler, then it should use fee_, since when the handler returns (OnMessageEnd) then the charge function is called. If the function is not a message handler, such as a job queue item, it should remain charge.
- Rename the job in missing-commits.yml from "check" to "up_to_date",
because other jobs named "check" prevent merges, but this one should
not prevent merges. How else are branches going to get caught up?
- Move the job in instrumentation.yml to nix.yml, but keep it entirely
independent.
If the permissioned domains amendment XLS-80 is enabled before credentials XLS-70, then the permissioned domain users will not be able to match any credentials. The changes here prevent the creation of any permissioned domain objects if credentials are not enabled.
Make `simulate` RPC easier to use:
* Prevent the use of `seed`, `secret`, `seed_hex`, and `passphrase` fields (to avoid confusing with the signing methods).
* Add autofilling of the `NetworkID` field.
- Also get the branch name.
- Use rev-parse instead of describe to get a clean hash.
- Return the git hash and branch name in server_info for admin
connections.
- Include git hash and branch name on separate lines in --version.
- spec: XRPLF/XRPL-Standards#220
- amendment: "DeepFreeze"
- implemented deep freeze spec to allow token issuers to prevent currency holders from being able to acquire more of these tokens.
- in combination with normal freeze, deep freeze effectively prevents any balance trust line balance change of a currency holder (except direct issuer <-> holder payments).
- added 2 new invariant checks to verify that deep freeze cannot be enacted without normal freeze and transfer is not frozen.
- made some fixes to existing freeze handling.
Co-authored-by: Ed Hennis <ed@ripple.com>
Co-authored-by: Howard Hinnant <howard.hinnant@gmail.com>
- Fix an erroneous high fee penalty that peers could incur for sending
older transactions.
- Update to the fees charged for imposing a load on the server.
- Prevent the relaying of internal pseudo-transactions.
- Before: Pseudo-transactions received from a peer will fail the signature
check, even if they were requested (using TMGetObjectByHash), because
they have no signature. This causes the peer to be charge for an
invalid signature.
- After: Pseudo-transactions, are put into the global cache
(TransactionMaster) only. If the transaction is not part of
a TMTransactions batch, the peer is charged an unwanted data fee.
These fees will not be a problem in the normal course of operations,
but should dissuade peers from behaving badly by sending a bunch of
junk.
- Improve logging: include the reason for fees charged to a peer.
Co-authored-by: Ed Hennis <ed@ripple.com>
* Has more steps, but allows merges to develop to continue when a
beta / RC is pending, increasing developer velocity.
* Add a CI job to check that no reverse merges have been missed.
* Add some useful scripts in bin/git:
* Set up upstreams as expected for safer pushes
* Squash a bunch of branches
* Set the version number
* Resolves an issue introduced in #5111, which inadvertently removed the
-Wno-maybe-uninitialized compiler option from some xrpl.libxrpl
modules. This resulted in new "may be used uninitialized" build
warnings, first noticed in the "protocol" module. When compiling with
derr=TRUE, those warnings became errors, which made the build fail.
* Github CI actions will build with the assert and werr options turned
on. This will cause CI jobs to fail if a developer introduces a new
compiler warning, or causes an assert to fail in release builds.
* Includes the OS and compiler version in the linux dependencies jobs in
the "check environment" step.
* Translates the `unity` build option into `CMAKE_UNITY_BUILD` setting.
The LEDGER_ENTRY macro now takes an additional parameter, which makes it easier to avoid missing including the new field in jss.h and to the list of account_objects/ledger_data filters.
Replace Issue in STIssue with Asset. STIssue with MPTIssue is only used in MPT tests.
Will be used in Vault and in transactions with STIssue fields once MPT is integrated into DEX.
* Rename ASSERT to XRPL_ASSERT
* Upgrade to Anthithesis SDK 0.4.4, and use new 0.4.4 features
* automatic cast to bool, like assert
* Add instrumentation workflow to verify build with instrumentation enabled
Adds two CMake functions:
* add_module(library subdirectory): Declares an OBJECT "library" (a CMake abstraction for a collection of object files) with sources from the given subdirectory of the given library, representing a module. Isolates the module's headers by creating a subdirectory in the build directory, e.g. .build/tmp123, that contains just a symlink, e.g. .build/tmp123/basics, to the module's header directory, e.g. include/xrpl/basics, in the source directory, and putting .build/tmp123 (but not include/xrpl) on the include path of the module sources. This prevents the module sources from including headers not explicitly linked to the module in CMake with target_link_libraries.
* target_link_modules(library scope modules...): Links the library target to each of the module targets, and removes their sources from its source list (so they are not compiled and linked twice).
Uses these functions to separate and explicitly link modules in libxrpl:
Level 01: beast
Level 02: basics
Level 03: json, crypto
Level 04: protocol
Level 05: resource, server
* Copy Antithesis SDK version 0.4.0 to directory external/
* Add build option `voidstar` to enable instrumentation with Antithesis SDK
* Define instrumentation macros ASSERT and UNREACHABLE in terms of regular C assert
* Replace asserts with named ASSERT or UNREACHABLE
* Add UNREACHABLE to LogicError
* Document instrumentation macros in CONTRIBUTING.md
`STNumber` lets objects and transactions contain multiple fields for
quantities of XRP, IOU, or MPT without duplicating information about the
"issue" (represented by `STIssue`). It is a straightforward serialization of
the `Number` type that uniformly represents those quantities.
---------
Co-authored-by: John Freeman <jfreeman08@gmail.com>
Co-authored-by: Howard Hinnant <howard.hinnant@gmail.com>
* 2.2.2 changed functions acquireAsync and NetworkOPsImp::recvValidation to add an item to a collection under lock, unlock, do some work, then lock again to do remove the item. It will deadlock if an exception is thrown while adding the item - before unlocking.
* Replace ScopedUnlock with scope_unlock.
Move the newest information to the top, i.e., use reverse chronological order within each of the two sections ("API Versions" and "XRP Ledger server versions")
The page_size will soon be made configurable with #5135, making this
re-ordering necessary.
When opening SQLite connection, there are specific pragmas set with
commonPragmas.
In particular, PRAGMA journal_mode creates journal file and locks the
page_size; as of this commit, this sets the page size to the default
value of 4096. Coincidentally, the hardcoded page_size was also 4096, so
no issue was noticed.
Update book_changes RPC to reduce latency, add "validated" field, and accept shortcut strings (current, closed, validated) for ledger_index.
`"validated": true` indicates that the transaction has been included in a validated ledger so the result of the transaction is immutable.
Fix#5033Fix#5034Fix#5035Fix#5036
---------
Co-authored-by: Bronek Kozicki <brok@incorrekt.com>
When rippled initiates a connection to SQLite3, rippled sends a "PRAGMA"
statement defining the maximum number of pages allowed in the database.
Update the max_page_count so it is consistent with the default for newer
versions of SQLite3. Increasing max_page_count is critical for keeping
full history servers online.
Fix#5102
* Retry some failed RPC connections / commands in unit tests
* Remove orphaned `getAccounts` function
Co-authored-by: John Freeman <jfreeman08@gmail.com>
* upstream/master:
Set version to 2.2.2
Allow only 1 job queue slot for each validation ledger check
Allow only 1 job queue slot for acquiring inbound ledger.
Track latencies of certain code blocks, and log if they take too long
* refactor filtering of validations to specifically avoid
concurrent checkAccept() calls for the same validation ledger hash.
* Log when duplicate concurrent validation requests are filtered.
* RAII for containers that track concurrent validation requests.
* Log when duplicate concurrent inbound ledger are filtered.
* RAII for containers that track concurrent inbound ledger.
* Comment on when to asynchronously acquire inbound ledgers, which
is possible to be always OK, but should have further review.
* Other small logging changes
Co-authored-by: Ed Hennis <ed@ripple.com>
Implements a CI workflow that detects when a new version of libxrpl is
proposed, uploads it to artifactory under the `clio` channel and
notifies Clio's CI to check this newly proposed version.
* Add fixNFTokenPageLinks amendment:
It was discovered that under rare circumstances the links between
NFTokenPages could be removed. If this happens, then the
account_objects and account_nfts RPC commands under-report the
NFTokens owned by an account.
The fixNFTokenPageLinks amendment does the following to address
the problem:
- It fixes the underlying problem so no further broken links
should be created.
- It adds Invariants so, if such damage were introduced in the
future, an invariant would stop it.
- It adds a new FixLedgerState transaction that repairs
directories that were damaged in this fashion.
- It adds unit tests for all of it.
* `account_objects` returns an invalid field error if `type` is not supported.
This includes objects an account can't own, or which are unsupported by `account_objects`
* Includes:
* Amendments
* Directory Node
* Fee Settings
* Ledger Hashes
* Negative UNL
The names of the files should reflect the name of the Dir class.
Co-authored-by: Zack Brunson <Zshooter@gmail.com>
Co-authored-by: Ed Hennis <ed@ripple.com>
* fix CTID in tx command returns invalidParams on lowercase hex
* test mixed case and change auto to explicit type
* add header cctype because std::tolower is called
* remove unused local variable
* change test case comment from 'lowercase' to 'mixed case'
---------
Co-authored-by: Zack Brunson <Zshooter@gmail.com>
* Add feature / amendment "InvariantsV1_1"
* Adds invariant AccountRootsDeletedClean:
* Checks that a deleted account doesn't leave any directly
accessible artifacts behind.
* Always tests, but only changes the transaction result if
featureInvariantsV1_1 is enabled.
* Unit tests.
* Resolves#4638
* [FOLD] Review feedback from @gregtatcam:
* Fix unused variable warning
* Improve Invariant test const correctness
* [FOLD] Review feedback from @mvadari:
* Centralize the account keylet function list, and some optimization
* [FOLD] Some structured binding doesn't work in clang
* [FOLD] Review feedback 2 from @mvadari:
* Clean up and clarify some comments.
* [FOLD] Change InvariantsV1_1 to unsupported
* Will allow multiple PRs to be merged over time using the same amendment.
* fixup! [FOLD] Change InvariantsV1_1 to unsupported
* [FOLD] Update and clarify some comments. No code changes.
* Move CMake directory
* Rearrange sources
* Rewrite includes
* Recompute loops
* Fix merge issue and formatting
---------
Co-authored-by: Pretty Printer <cpp@ripple.com>
* fix "account_nfts" with unassociated marker returning issue
* create unit test for fixing nft page invalid marker not returning error
add more test
change test name
create unit test
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* fix "account_nfts" with unassociated marker returning issue
* [FOLD] accumulated review suggestions
* move BEAST check out of lambda function
---------
Authored-by: Scott Schurr <scott@ripple.com>
* fixInnerObjTemplate2 amendment:
Apply inner object templates to all remaining (non-AMM)
inner objects.
Adds a unit test for applying the template to sfMajorities.
Other remaining inner objects showed no problems having
templates applied.
* Move CMake directory
* Rearrange sources
* Rewrite includes
* Recompute loops
---------
Co-authored-by: Pretty Printer <cpp@ripple.com>
Fix interactions between NFTokenOffers and trust lines.
Since the NFTokenAcceptOffer does not check the trust line that
the issuer receives as a transfer fee in the NFTokenAcceptOffer,
if the issuer deletes the trust line after NFTokenCreateOffer,
the trust line is created for the issuer by the
NFTokenAcceptOffer. That's fixed.
Resolves#4925.
Fixes issue #4937.
The fixReducedOffersV1 amendment fixed certain forms of offer
modification that could lead to blocked order books. Reduced
offers can block order books if the effective quality of the
reduced offer is worse than the quality of the original offer
(from the perspective of the taker). It turns out that, for
small values, the quality of the reduced offer can be
significantly affected by the rounding mode used during
scaling computations.
Issue #4937 identified an additional code path that modified
offers in a way that could lead to blocked order books. This
commit changes the rounding in that newly located code path so
the quality of the modified offer is never worse than the
quality of the offer as it was originally placed.
It is possible that additional ways of producing blocking
offers will come to light. Therefore there may be a future
need for a V3 amendment.
* Add trap_tx_hash command line option
This new option can be used only if replay is also enabled. It takes a transaction hash from the ledger loaded for replay, and will cause a specific line to be hit in Transactor.cpp, right before the selected transaction is applied.
Price Oracle data-series logic uses `unordered_map` to update the Oracle object.
This results in different servers disagreeing on the order of that hash table.
Consequently, the generated ledgers will have different hashes.
The fix uses `map` instead to guarantee the order of the token pairs
in the data-series.
Due to the rounding, LPTokenBalance of the last
Liquidity Provider (LP), might not match this LP's
trustline balance. This fix sets LPTokenBalance on
last LP withdrawal to this LP's LPToken trustline
balance.
Single path AMM offer has to factor in the transfer in rate
when calculating the upper bound quality and the quality function
because single path AMM's offer quality is not constant.
This fix factors in the transfer fee in
BookStep::adjustQualityWithFees().
* Fix AMM offer rounding and low quality LOB offer blocking AMM:
A single-path AMM offer with account offer on DEX, is always generated
starting with the takerPays first, which is rounded up, and then
the takerGets, which is rounded down. This rounding ensures that the pool's
product invariant is maintained. However, when one of the offer's side
is XRP, this rounding can result in the AMM offer having a lower
quality, potentially causing offer generation to fail if the quality
is lower than the account's offer quality.
To address this issue, the proposed fix adjusts the offer generation process
to start with the XRP side first and always rounds it down. This results
in a smaller offer size, improving the offer's quality. Regardless if the offer
has XRP or not, the rounding is done so that the offer size is minimized.
This change still ensures the product invariant, as the other generated
side is the exact result of the swap-in or swap-out equations.
If a liquidity can be provided by both AMM and LOB offer on offer crossing
then AMM offer is generated so that it matches LOB offer quality. If LOB
offer quality is less than limit quality then generated AMM offer quality
is also less than limit quality and the offer doesn't cross. To address
this issue, if LOB quality is better than limit quality then use LOB
quality to generate AMM offer. Otherwise, don't use the quality to generate
AMM offer. In this case, limitOut() function in StrandFlow limits
the out amount to match strand's quality to limit quality and consume
maximum AMM liquidity.
Rounding in the payment engine is causing an assert to sometimes fire
with "dust" amounts. This is causing issues when running debug builds of
rippled. This issue will be addressed, but the assert is no longer
serving its purpose.
I am resigning from my role as maintainer of the `rippled` codebase.
Please update repository permissions accordingly, prior to merging this pull request.
Thanks to everyone who has contributed, especially those whom I had the opportunity to closely collaborate with.
The AMM has an invariant for swaps where:
new_balance_1*new_balance_2 >= old_balance_1*old_balance_2
Due to rounding, this invariant could sometimes be violated (although by
very small amounts).
This patch introduces an amendment `fixAMMRounding` that changes the
rounding to always favor the AMM. Doing this should maintain the
invariant.
Co-authored-by: Bronek Kozicki
Co-authored-by: thejohnfreeman
It can be difficult to make transaction breaking changes to low level
code because the low level code does not have access to a ledger and the
current activated amendments in that ledger (the "rules"). This patch
adds global access to the current ledger rules as a `std::optional`. If
the optional is not seated, then there is no active transaction.
The `rotateWithLock` function holds a lock while it calls a callback
function that's passed in by the caller. This is a problematic design
that needs to be used very carefully. In this case, at least one caller
passed in a callback that eventually relocks the mutex on the same
thread, causing UB (a deadlock was observed). The caller was from
SHAMapStoreImpl, and it called `clearCaches`. This `clearCaches` can
potentially call `fetchNodeObject`, which tried to relock the mutex.
This patch resolves the issue by changing the mutex type to a
`recursive_mutex`. Ideally, the code should be rewritten so it doesn't
hold the mutex during the callback and the mutex should be changed back
to a regular mutex.
Co-authored-by: Ed Hennis <ed@ripple.com>
* Amend `.codecov.yml` to disable coverage reporting of test sources
and explicitly set most parameters
* Increase codecov upload retry time to 210s (from 35s)
* Upgrade gcovr adding support for more coverage formats (lcov, clover, jacoco)
* Upgrade github actions in coverage workflow
* Explicitly disable codecov plugins (also removing `gcov` coverage, which is not
correctly handled by codecov https://github.com/codecov/feedback/issues/334)
This amendment, `fixPreviousTxnID`, adds `PreviousTxnID` and
`PreviousTxnLgrSequence` as fields to all ledger objects that did
not already have them included (`DirectoryNode`, `Amendments`,
`FeeSettings`, `NegativeUNL`, and `AMM`). This makes it much easier
to go through the history of these ledger objects.
Github Actions for the build/test jobs (nix.yml, mac.yml, windows.yml) will only run on branches that build packages (develop, release, master), and branches with names starting with "ci/". This is intended as a compromise between disabling CI jobs on personal forks entirely, and having the jobs run as a free-for-all. Note that it will not affect PR jobs at all.
A large synthetic offer was not handled correctly in the payment engine.
This patch fixes that issue and introduces a new invariant check while
processing synthetic offers.
When calculating reward shares, the amount should always be rounded
down. If the `fixUniversalNumber` amendment is not active, this works
correctly. If it is not active, then the amount is incorrectly rounded
up. This patch introduces an amendment so it will be rounded down.
This fixes a case where a peer can desync under a certain timing
circumstance--if it reaches a certain point in consensus before it receives
proposals.
This was noticed under high transaction volumes. Namely, when we arrive at the
point of deciding whether consensus is reached after minimum establish phase
duration but before having received any proposals. This could be caused by
finishing the previous round slightly faster and/or having some delay in
receiving proposals. Existing behavior arrives at consensus immediately after
the minimum establish duration with no proposals. This causes us to desync
because we then close a non-validated ledger. The change in this PR causes us to
wait for a configured threshold before making the decision to arrive at
consensus with no proposals. This allows validators to catch up and for brief
delays in receiving proposals to be absorbed. There should be no drawback since,
with no proposals coming in, we needn't be in a huge rush to jump ahead.
Remove the zaphod.alloy.ee hubs from the bootstrap and default configuration after 5 years. It has been an honor to run these servers, but it is now time for another entity to step into this role.
The zaphod servers will be taken offline in a phased manner keeping all those who have peering arrangements informed.
These would be the preferred attributes of a boostrap set of hubs:
1. Commitment to run the hubs for a minimum of 2 years
2. Highly available
3. Geographically dispersed
4. Secure and up to date
5. Committed to ensure that peering information is kept private
We do not currently enforce that incoming peer connection does not have
remote_endpoint which is already used (either by incoming or outgoing
connection), hence already stored in slots_. If we happen to receive a
connection from such a duplicate remote_endpoint, it will eventually result in a
crash (when disconnecting) or weird behavior (when updating slot state), as a
result of an apparently matching remote_endpoint in slots_ being used by a
different connection.
This amendment fixes an edge case where an empty DID object can be
created. It adds an additional check to ensure that DIDs are
non-empty when created, and returns a `tecEMPTY_DID` error if the DID
would be empty.
The witness server makes heavily use of the `account_tx` RPC command. Perf
testing showed that the SQL query used by `account_tx` became unacceptably slow
when the DB was large and there was a `marker` parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use `UNION` instead of `OR` and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.
- Update container for Doxygen workflow. Matches Linux workflow, with newer GLIBC version required by newer actions.
- Fixes macOS workflow to install and configure Conan correctly. Still fails on tests, but that does not seem attributable to the workflow.
* telENV_RPC_FAILED is a new code, reserved exclusively
for unit tests when RPC fails. This will
make those types of errors distinct and easier to test
for when expected and/or diagnose when not.
* Output RPC command result when result is not expected.
We are currently using old version 0.6.2 of `xxhash`, as a verbatim copy and paste of its header file `xxhash.h`. Switch to the more recent version 0.8.2. Since this version is in Conan Center (and properly protects its ABI by keeping the state object incomplete), add it as a Conan requirement. Switch to the SIMD instructions (in the new `XXH3` family) supported by the new version.
This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).
* It is now an invariant that all constructed Public Keys are valid,
non-empty and contain 33 bytes of data.
* Additionally, the memory footprint of the PublicKey class is reduced.
The size_ data member is declared as static.
* Distinguish and identify the PublisherList retrieved from the local
config file, versus the ones obtained from other validators.
* Fixes#2942
The compilation fails due to an issue in the initializer list
of an optional argument, which holds a vector of pairs.
The code compiles correctly on earlier gcc versions, but fails on gcc 13.
Implement native support for Price Oracles.
A Price Oracle is used to bring real-world data, such as market prices,
onto the blockchain, enabling dApps to access and utilize information
that resides outside the blockchain.
Add Price Oracle functionality:
- OracleSet: create or update the Oracle object
- OracleDelete: delete the Oracle object
To support this functionality add:
- New RPC method, `get_aggregate_price`, to calculate aggregate price for a token pair of the specified oracles
- `ltOracle` object
The `ltOracle` object maintains:
- Oracle Owner's account
- Oracle's metadata
- Up to ten token pairs with the scaled price
- The last update time the token pairs were updated
Add Oracle unit-tests
* Commit 01c37fe introduced a change to the STTx unit test where a local
"defaultRules" object was created with a temporary inline "presets"
value provided to the ctor. Rules::Impl stores a const ref to the
presets provided to the ctor. This particular call provided an inline
temp variable, which goes out of scope as soon as the object is
created. On Windows, attempting to use the presets (e.g. via the
enabled() function) causes an access violation, which crashes the test
run.
* An audit of the code indicates that all other instances of Rules use
the Application's config.features list, which will have a sufficient
lifetime.
Add `STObject` constructor to explicitly set the inner object template.
This allows certain AMM transactions to apply in the same ledger:
There is no issue if the trading fee is greater than or equal to 0.01%.
If the trading fee is less than 0.01%, then:
- After AMM create, AMM transactions must wait for one ledger to close
(3-5 seconds).
- After one ledger is validated, all AMM transactions succeed, as
appropriate, except for AMMVote.
- The first AMMVote which votes for a 0 trading fee in a ledger will
succeed. Subsequent AMMVote transactions which vote for a 0 trading
fee will wait for the next ledger (3-5 seconds). This behavior repeats
for each ledger.
This has no effect on the ultimate correctness of AMM. This amendment
will allow the transactions described above to succeed as expected, even
if the trading fee is 0 and the transactions are applied within one
ledger (block).
Prior to this commit, `port_grpc` could not be added to the [server]
stanza. Instead of validating gRPC IP/Port/Protocol information in
ServerHandler, validate grpc port info in GRPCServer constructor. This
should not break backwards compatibility.
gRPC-related config info must be in a section (stanza) called
[port_gprc].
* Close#4015 - That was an alternate solution. It was decided that with
relaxed validation, it is not necessary to rename port_grpc.
* Fix#4557
These headers are required in the xrpl Conan package in order for
xbridge witness server (xbwd) to build. This change to libxrpl may help
any dependents of libxrpl. This addition does not change any C++ code.
Without this amendment, an NFTokenAcceptOffer transaction can succeed
even when the NFToken recipient does not have sufficient reserves for
the new NFTokenPage. This allowed accounts to accept NFT sell offers
without having a sufficient reserve. (However, there was no issue in
brokered mode or when a buy offer is involved.)
Instead, the transaction should fail with `tecINSUFFICIENT_RESERVE` as
appropriate. The `fixNFTokenReserve` amendment adds checks in the
NFTokenAcceptOffer transactor to check if the OwnerCount changed. If it
did, then it checks the new reserve requirement.
Fix#4679
Use consistent platform-agnostic library names on all platforms.
Fix an issue that prevents dependents like validator-keys-tool from
linking to libxrpl on Windows.
It is bad practice to change the binary base name depending on the
platform. CMake already manipulates the base name into a final name that
fits the conventions of the platform. Linkers accept base names on the
command line and then look for conventional names on disk.
* Disable the Windows CI unit tests "allowed to fail" workaround which
was previously introduced in #4596.
* The runner hardware was upgraded, and the unit tests have been passing
since then.
Update to #4849, using a workaround for spurious codecov upload errors.
Spurious codecov upload errors are expected in public repos which rely
on PRs via forks. Retrying uploads is a decent and easy workaround.
Clients subscribed to `transactions` over WebSocket are being
disconnected because the traffic exceeds the default `send_queue_limit`
of 100.
This commit changes the default configuration, not the default in code.
Fix#4866
Resolves a warning that was emitted from the clang compiler. Switches
usage of the sprintf function to the recommended snprintf function.
Warning was observed in Apple clang version 15.0.0 (clang-1500.0.40.1).
Fix#4569
* Add logging for Application.cpp sweep()
* Improve lifetime management of ledger objects (`SLE`s)
* Only store SLE digest in CachedView; get SLEs from CachedSLEs
* Also force release of last ledger used for path finding if there are
no path finding requests to process
* Count more ST objects (derive from `CountedObject`)
* Track CachedView stats in CountedObjects
* Rename the CachedView counters
* Fix the scope of the digest lookup lock
Before this patch, if you asked "is it caching?" It was always caching.
Prevent WebSocket connections from trying to close twice.
The issue only occurs in debug builds (assertions are disabled in
release builds, including published packages), and when the WebSocket
connections are unprivileged. The assert (and WRN log) occurs when a
client drives up the resource balance enough to be forcibly disconnected
while there are still messages pending to be sent.
Thanks to @lathanbritz for discovering this issue in #4822.
This reverts commit 002893f280.
There were two files with conflicts in the automated revert:
- src/ripple/rpc/impl/RPCHelpers.h and
- src/test/rpc/JSONRPC_test.cpp
Those files were manually resolved.
Workaround for compilation errors with gcc-13 and other compilers
relying on `libstdc++` version 13. This is temporary until actual fix is
available for us to use: https://github.com/boostorg/beast/pull/2682
Some boost.beast files (which we do use) rely on an old gcc-12 behaviour
where `#include <cstdint>` was not needed even though types from this
header were used. This was broken by a change in libstdc++ version 13:
https://gcc.gnu.org/gcc-13/porting_to.html#header-dep-changes
The necessary fix was implemented in boost.beast, however it is not yet
available. Until it is available, we can use this workaround to enable
compilation of `rippled` with gcc-13, clang-16, etc.
* Revert "Optimize calculation of close time to avoid impasse and minimize gratuitous proposal changes (#4760)"
This reverts commit 8ce85a9750.
* Revert "Several changes to improve Consensus stability: (#4505)"
This reverts commit f259cc1ab6.
* Add missing include
---------
Co-authored-by: seelabs <scott.determan@yahoo.com>
# This file is sorted in reverse chronological order, with the most recent commits at the top.
# The commits listed here are ignored by git blame, which is useful for formatting-only commits that would otherwise obscure the history of changes to a file.
If relevant, use this section for an English description of the change at a technical level.
If this change affects an API, examples should be included here.
For performance-impacting changes, please provide these details:
1. Is this a new feature, bug fix, or improvement to existing functionality?
2. What behavior/functionality does the change impact?
3. In what processing can the impact be measured? Be as specific as possible - e.g. RPC client call, payment transaction that involves LOB, AMM, caching, DB operations, etc.
4. Does this change affect concurrent processing - e.g. does it involve acquiring locks, multi-threaded processing, or async processing?
# Restore copyright notices that were removed from specific files, without
# restoring the verbiage that is already present in LICENSE.md. Ensure that if
# the script is run multiple times, duplicate notices are not added.
if ! grep -q 'Raw Material Software' include/xrpl/beast/core/CurrentThreadName.h;then
echo -e "// Portions of this file are from JUCE (http://www.juce.com).\n// Copyright (c) 2013 - Raw Material Software Ltd.\n// Please visit http://www.juce.com\n\n$(cat include/xrpl/beast/core/CurrentThreadName.h)" >include/xrpl/beast/core/CurrentThreadName.h
fi
if ! grep -q 'Dev Null' src/test/app/NetworkID_test.cpp;then
# Add ASAN and UBSAN configurations for both gcc-15 and clang-22
configurations.append(
{
"config_name":config_name+"-asan",
"cmake_args":cmake_args,
"cmake_target":cmake_target,
"build_only":build_only,
"build_type":build_type,
"os":os,
"architecture":architecture,
"sanitizers":"address",
}
)
configurations.append(
{
"config_name":config_name+"-ubsan",
"cmake_args":cmake_args,
"cmake_target":cmake_target,
"build_only":build_only,
"build_type":build_type,
"os":os,
"architecture":architecture,
"sanitizers":"undefinedbehavior",
}
)
# TSAN is deactivated due to seg faults with latest compilers.
activate_tsan=False
ifactivate_tsan:
configurations.append(
{
"config_name":config_name+"-tsan-ubsan",
"cmake_args":cmake_args,
"cmake_target":cmake_target,
"build_only":build_only,
"build_type":build_type,
"os":os,
"architecture":architecture,
"sanitizers":"thread,undefinedbehavior",
}
)
else:
configurations.append(
{
"config_name":config_name,
"cmake_args":cmake_args,
"cmake_target":cmake_target,
"build_only":build_only,
"build_type":build_type,
"os":os,
"architecture":architecture,
"sanitizers":"",
}
)
returnconfigurations
defread_config(file:Path)->Config:
config=json.loads(file.read_text())
if(
config["architecture"]isNone
orconfig["os"]isNone
orconfig["build_type"]isNone
orconfig["cmake_args"]isNone
):
raiseException("Invalid configuration file.")
returnConfig(**config)
if__name__=="__main__":
parser=argparse.ArgumentParser()
parser.add_argument(
"-a",
"--all",
help="Set to generate all configurations (generally used when merging a PR) or leave unset to generate a subset of configurations (generally used when committing to a PR).",
action="store_true",
)
parser.add_argument(
"-c",
"--config",
help="Path to the JSON file containing the strategy matrix configurations.",
required=False,
type=Path,
)
parser.add_argument(
"-p",
"--packaging",
help="Emit the packaging matrix (derived from the 'package' field on os entries) instead of the build/test matrix.",
@@ -4,51 +4,186 @@ This changelog is intended to list all updates to the [public API methods](https
For info about how [API versioning](https://xrpl.org/request-formatting.html#api-versioning) works, including examples, please view the [XLS-22d spec](https://github.com/XRPLF/XRPL-Standards/discussions/54). For details about the implementation of API versioning, view the [implementation PR](https://github.com/XRPLF/rippled/pull/3155). API versioning ensures existing integrations and users continue to receive existing behavior, while those that request a higher API version will experience new behavior.
The API version controls the API behavior you see. This includes what properties you see in responses, what parameters you're permitted to send in requests, and so on. You specify the API version in each of your requests. When a breaking change is introduced to the `rippled` API, a new version is released. To avoid breaking your code, you should set (or increase) your version when you're ready to upgrade.
The API version controls the API behavior you see. This includes what properties you see in responses, what parameters you're permitted to send in requests, and so on. You specify the API version in each of your requests. When a breaking change is introduced to the `xrpld` API, a new version is released. To avoid breaking your code, you should set (or increase) your version when you're ready to upgrade.
For a log of breaking changes, see the **API Version [number]** headings. In general, breaking changes are associated with a particular API Version number. For non-breaking changes, scroll to the **XRP Ledger version [x.y.z]** headings. Non-breaking changes are associated with a particular XRP Ledger (`rippled`) release.
The [commandline](https://xrpl.org/docs/references/http-websocket-apis/api-conventions/request-formatting/#commandline-format) always uses the latest API version. The command line is intended for ad-hoc usage by humans, not programs or automated scripts. The command line is not meant for use in production code.
For a log of breaking changes, see the **API Version [number]** headings. In general, breaking changes are associated with a particular API Version number. For non-breaking changes, scroll to the **XRP Ledger version [x.y.z]** headings. Non-breaking changes are associated with a particular XRP Ledger (`xrpld`) release.
## API Version 3 (Beta)
API version 3 is currently a beta API. It requires enabling `[beta_rpc_api]` in the xrpld configuration to use. See [API-VERSION-3.md](API-VERSION-3.md) for the full list of changes in API version 3.
## API Version 2
API version 2 is available in `xrpld` version 2.0.0 and later. See [API-VERSION-2.md](API-VERSION-2.md) for the full list of changes in API version 2.
## API Version 1
This version is supported by all `rippled` versions. At time of writing, it is the default API version, used when no `api_version` is specified. When a new API version is introduced, the command line interface will default to the latest API version. The command line is intended for ad-hoc usage by humans, not programs or automated scripts. The command line is not meant for use in production code.
This version is supported by all `xrpld` versions. For WebSocket and HTTP JSON-RPC requests, it is currently the default API version used when no `api_version` is specified.
### Idiosyncrasies
## Unreleased
#### V1 account_info response
In [the response to the `account_info` command](https://xrpl.org/account_info.html#response-format), there is `account_data` - which is supposed to be an `AccountRoot` object - and `signer_lists` is returned in this object. However, the docs say that `signer_lists` should be at the root level of the reponse.
It makes sense for `signer_lists` to be at the root level because signer lists are not part of the AccountRoot object. (First reported in [xrpl-dev-portal#938](https://github.com/XRPLF/xrpl-dev-portal/issues/938).)
In `api_version: 2`, the `signer_lists` field [will be moved](#modifications-to-account_info-response-in-v2) to the root level of the account_info response. (https://github.com/XRPLF/rippled/pull/3770)
#### server_info - network_id
The `network_id` field was added in the `server_info` response in version 1.5.0 (2019), but it was not returned in [reporting mode](https://xrpl.org/rippled-server-modes.html#reporting-mode).
## Unreleased (expected in XRP Ledger version 2.0)
This section contains changes targeting a future version.
### Additions
Additions are intended to be non-breaking (because they are purely additive).
-`ledger_entry`, `account_objects`: The `Delegate` ledger entry now includes an optional `DestinationNode` field, which stores the index into the authorized account's owner directory. This field is present on entries created after bidirectional directory tracking was introduced and may appear in RPC responses for those entries. ([#6681](https://github.com/XRPLF/rippled/pull/6681))
-`server_definitions`: Added the following new sections to the response ([#6321](https://github.com/XRPLF/rippled/pull/6321)):
-`TRANSACTION_FORMATS`: Describes the fields and their optionality for each transaction type, including common fields shared across all transactions.
-`LEDGER_ENTRY_FORMATS`: Describes the fields and their optionality for each ledger entry type, including common fields shared across all ledger entries.
-`TRANSACTION_FLAGS`: Maps transaction type names to their supported flags and flag values.
-`LEDGER_ENTRY_FLAGS`: Maps ledger entry type names to their flags and flag values.
-`ACCOUNT_SET_FLAGS`: Maps AccountSet flag names (asf flags) to their numeric values.
### Bugfixes
- Peer Crawler: The `port` field in `overlay.active[]` now consistently returns an integer instead of a string for outbound peers. [#6318](https://github.com/XRPLF/rippled/pull/6318)
-`ping`: The `ip` field is no longer returned as an empty string for proxied connections without a forwarded-for header. It is now omitted, consistent with the behavior for identified connections. [#6730](https://github.com/XRPLF/rippled/pull/6730)
- gRPC `GetLedgerDiff`: Fixed error message that incorrectly said "base ledger not validated" when the desired ledger was not validated. [#6730](https://github.com/XRPLF/rippled/pull/6730)
-`account_channels`: The `destination_account` field now returns an error if the value is not a string. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`subscribe`: The `taker` field in the `books` array now returns an error if the value is not a string. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`account_info`: The `urlgravatar` field now uses HTTPS instead of HTTP. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`ledger`: The `full`, `accounts`, `transactions`, `expand`, `binary`, `owner_funds`, and `queue` fields now return an error if the value is not a boolean. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`ledger_data`: The `binary` field now returns an error if the value is not a boolean. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`submit`: The `fail_hard` field now returns an error if the value is not a boolean. [#6529](https://github.com/XRPLF/rippled/pull/6529)
-`subscribe`: The `taker` field in the `books` array now returns `actMalformed` instead of `badIssuer` if the value is not a valid account. [#6529](https://github.com/XRPLF/rippled/pull/6529)
- Fixed a bug in `Forwarded` HTTP header parsing where the extracted IP address could be incorrect when no comma or semicolon delimiter follows the address. This could cause the server to misidentify a client's IP address when operating behind a reverse proxy. [#6529](https://github.com/XRPLF/rippled/pull/6529)
## XRP Ledger server version 3.1.0
[Version 3.1.0](https://github.com/XRPLF/rippled/releases/tag/3.1.0) was released on Jan 27, 2026.
### Additions in 3.1.0
-`vault_info`: New RPC method to retrieve information about a specific vault (part of XLS-66 Lending Protocol). ([#6156](https://github.com/XRPLF/rippled/pull/6156))
## XRP Ledger server version 3.0.0
[Version 3.0.0](https://github.com/XRPLF/rippled/releases/tag/3.0.0) was released on Dec 9, 2025.
### Additions in 3.0.0
-`ledger_entry`: Supports all ledger entry types with dedicated parsers. ([#5237](https://github.com/XRPLF/rippled/pull/5237))
-`ledger_entry`: New error codes `entryNotFound` and `unexpectedLedgerType` for more specific error handling. ([#5237](https://github.com/XRPLF/rippled/pull/5237))
-`ledger_entry`: Improved error messages with more context (e.g., specifying which field is invalid or missing). ([#5237](https://github.com/XRPLF/rippled/pull/5237))
-`ledger_entry`: Assorted bug fixes in RPC processing. ([#5237](https://github.com/XRPLF/rippled/pull/5237))
-`simulate`: Supports additional metadata in the response. ([#5754](https://github.com/XRPLF/rippled/pull/5754))
## XRP Ledger server version 2.6.2
[Version 2.6.2](https://github.com/XRPLF/rippled/releases/tag/2.6.2) was released on Nov 19, 2025.
This release contains bug fixes only and no API changes.
## XRP Ledger server version 2.6.1
[Version 2.6.1](https://github.com/XRPLF/rippled/releases/tag/2.6.1) was released on Sep 30, 2025.
This release contains bug fixes only and no API changes.
## XRP Ledger server version 2.6.0
[Version 2.6.0](https://github.com/XRPLF/rippled/releases/tag/2.6.0) was released on Aug 27, 2025.
### Additions in 2.6.0
-`account_info`: Added `allowTrustLineLocking` flag in response. ([#5525](https://github.com/XRPLF/rippled/pull/5525))
-`ledger`: Removed the type filter from the RPC command. ([#4934](https://github.com/XRPLF/rippled/pull/4934))
-`subscribe` (`validations` stream): `network_id` is now included. ([#5579](https://github.com/XRPLF/rippled/pull/5579))
-`subscribe` (`transactions` stream): `nftoken_id`, `nftoken_ids`, and `offer_id` are now included in transaction metadata. ([#5230](https://github.com/XRPLF/rippled/pull/5230))
## XRP Ledger server version 2.5.1
[Version 2.5.1](https://github.com/XRPLF/rippled/releases/tag/2.5.1) was released on Sep 17, 2025.
This release contains bug fixes only and no API changes.
## XRP Ledger server version 2.5.0
[Version 2.5.0](https://github.com/XRPLF/rippled/releases/tag/2.5.0) was released on Jun 24, 2025.
### Additions and bugfixes in 2.5.0
-`tx`: Added `ctid` field to the response and improved error handling. ([#4738](https://github.com/XRPLF/rippled/pull/4738))
-`ledger_entry`: Improved error messages in `permissioned_domain`. ([#5344](https://github.com/XRPLF/rippled/pull/5344))
-`channel_authorize`: If `signing_support` is not enabled in the config, the RPC is disabled. ([#5385](https://github.com/XRPLF/rippled/pull/5385))
-`subscribe` (admin): Removed webhook queue limit to prevent dropping notifications; reduced HTTP timeout from 10 minutes to 30 seconds. ([#5163](https://github.com/XRPLF/rippled/pull/5163))
-`ledger_data` (gRPC): Fixed crashing issue with some invalid markers. ([#5137](https://github.com/XRPLF/rippled/pull/5137))
-`account_lines`: Fixed error with `no_ripple` and `no_ripple_peer` sometimes showing up incorrectly. ([#5345](https://github.com/XRPLF/rippled/pull/5345))
-`account_tx`: Fixed issue with incorrect CTIDs. ([#5408](https://github.com/XRPLF/rippled/pull/5408))
## XRP Ledger server version 2.4.0
[Version 2.4.0](https://github.com/XRPLF/rippled/releases/tag/2.4.0) was released on March 4, 2025.
### Additions and bugfixes in 2.4.0
-`simulate`: A new RPC that executes a [dry run of a transaction submission](https://github.com/XRPLF/XRPL-Standards/tree/master/XLS-0069d-simulate#2-rpc-simulate). ([#5069](https://github.com/XRPLF/rippled/pull/5069))
- Signing methods (`sign`, `sign_for`, `submit`): Autofill fees better, properly handle transactions without a base fee, and autofill the `NetworkID` field. ([#5069](https://github.com/XRPLF/rippled/pull/5069))
-`ledger_entry`: `state` is added as an alias for `ripple_state`. ([#5199](https://github.com/XRPLF/rippled/pull/5199))
-`ledger`, `ledger_data`, `account_objects`: Support filtering ledger entry types by their canonical names (case-insensitive). ([#5271](https://github.com/XRPLF/rippled/pull/5271))
-`validators`: Added new field `validator_list_threshold` in response. ([#5112](https://github.com/XRPLF/rippled/pull/5112))
-`server_info`: Added git commit hash info on admin connection. ([#5225](https://github.com/XRPLF/rippled/pull/5225))
-`server_definitions`: Changed larger `UInt` serialized types to `Hash`. ([#5231](https://github.com/XRPLF/rippled/pull/5231))
## XRP Ledger server version 2.3.1
[Version 2.3.1](https://github.com/XRPLF/rippled/releases/tag/2.3.1) was released on Jan 29, 2025.
This release contains bug fixes only and no API changes.
## XRP Ledger server version 2.3.0
[Version 2.3.0](https://github.com/XRPLF/rippled/releases/tag/2.3.0) was released on Nov 25, 2024.
### Breaking changes in 2.3.0
-`book_changes`: If the requested ledger version is not available on this node, a `ledgerNotFound` error is returned and the node does not attempt to acquire the ledger from the p2p network (as with other non-admin RPCs). Admins can still attempt to retrieve old ledgers with the `ledger_request` RPC.
### Additions and bugfixes in 2.3.0
-`book_changes`: Returns a `validated` field in its response. ([#5096](https://github.com/XRPLF/rippled/pull/5096))
-`book_changes`: Accepts shortcut strings (`current`, `closed`, `validated`) for the `ledger_index` parameter. ([#5096](https://github.com/XRPLF/rippled/pull/5096))
-`server_definitions`: Include `index` in response. ([#5190](https://github.com/XRPLF/rippled/pull/5190))
-`account_nfts`: Fix issue where unassociated marker would return incorrect results. ([#5045](https://github.com/XRPLF/rippled/pull/5045))
-`account_objects`: Fix issue where invalid marker would not return an error. ([#5046](https://github.com/XRPLF/rippled/pull/5046))
-`account_objects`: Disallow filtering by ledger entry types that an account cannot hold. ([#5056](https://github.com/XRPLF/rippled/pull/5056))
-`feature`: Better error handling for invalid values of `feature`. ([#5063](https://github.com/XRPLF/rippled/pull/5063))
## XRP Ledger server version 2.2.0
[Version 2.2.0](https://github.com/XRPLF/rippled/releases/tag/2.2.0) was released on Jun 5, 2024. The following additions are non-breaking (because they are purely additive):
-`feature`: Add a non-admin mode for users. (It was previously only available to admin connections.) The method returns an updated list of amendments, including their names and other information. ([#4781](https://github.com/XRPLF/rippled/pull/4781))
## XRP Ledger server version 2.0.1
[Version 2.0.1](https://github.com/XRPLF/rippled/releases/tag/2.0.1) was released on Jan 29, 2024. The following additions are non-breaking:
[Version 2.0.0](https://github.com/XRPLF/rippled/releases/tag/2.0.0) was released on Jan 9, 2024. The following additions are non-breaking (because they are purely additive):
-`server_definitions`: A new RPC that generates a `definitions.json`-like output that can be used in XRPL libraries.
- In `Payment` transactions, `DeliverMax` has been added. This is a replacement for the `Amount` field, which should be no longer used - instead, use `delivered_amount` in transaction metadata. To ease the transition, `DeliverMax` is present regardless of API version, since adding a field is non-breaking. The field `Amount` is no longer present in `Payment`s in API version 2.
- In `Payment` transactions, `DeliverMax` has been added. This is a replacement for the `Amount` field, which should not be used. Typically, the `delivered_amount`(in transaction metadata) should be used. To ease the transition, `DeliverMax` is present regardless of API version, since adding a field is non-breaking.
- API version 2 has been moved from beta to supported, meaning that it is generally available (regardless of the `beta_rpc_api` setting). The full list of changes is in [API-VERSION-2.md](API-VERSION-2.md).
## XRP Ledger version 1.12.0
## XRP Ledger server version 1.12.0
[Version 1.12.0](https://github.com/XRPLF/rippled/releases/tag/1.12.0) was released on Sep 6, 2023.
[Version 1.12.0](https://github.com/XRPLF/rippled/releases/tag/1.12.0) was released on Sep 6, 2023. The following additions are non-breaking (because they are purely additive):
### Additions in 1.12
Additions are intended to be non-breaking (because they are purely additive).
-`server_info`: Added `ports`, an array which advertises the RPC and WebSocket ports. This information is also included in the `/crawl` endpoint (which calls `server_info` internally). `grpc` and `peer` ports are also included. (https://github.com/XRPLF/rippled/pull/4427)
-`server_info`: Added `ports`, an array which advertises the RPC and WebSocket ports. This information is also included in the `/crawl` endpoint (which calls `server_info` internally). `grpc` and `peer` ports are also included. ([#4427](https://github.com/XRPLF/rippled/pull/4427))
-`ports` contains objects, each containing a `port` for the listening port (a number string), and a `protocol` array listing the supported protocols on that port.
- This allows crawlers to build a more detailed topology without needing to port-scan nodes.
- (For peers and other non-admin clients, the info about admin ports is excluded.)
- Clawback: The following additions are gated by the Clawback amendment (`featureClawback`). (https://github.com/XRPLF/rippled/pull/4553)
- Adds an [AccountRoot flag](https://xrpl.org/accountroot.html#accountroot-flags) called `lsfAllowTrustLineClawback` (https://github.com/XRPLF/rippled/pull/4617)
- Clawback: The following additions are gated by the Clawback amendment (`featureClawback`). ([#4553](https://github.com/XRPLF/rippled/pull/4553))
- Adds an [AccountRoot flag](https://xrpl.org/accountroot.html#accountroot-flags) called `lsfAllowTrustLineClawback`. ([#4617](https://github.com/XRPLF/rippled/pull/4617))
- Adds the corresponding `asfAllowTrustLineClawback` [AccountSet Flag](https://xrpl.org/accountset.html#accountset-flags) as well.
- Clawback is disabled by default, so if an issuer desires the ability to claw back funds, they must use an `AccountSet` transaction to set the AllowTrustLineClawback flag. They must do this before creating any trust lines, offers, escrows, payment channels, or checks.
- Adds the [Clawback transaction type](https://github.com/XRPLF/XRPL-Standards/blob/master/XLS-39d-clawback/README.md#331-clawback-transaction), containing these fields:
@@ -72,27 +207,27 @@ Additions are intended to be non-breaking (because they are purely additive).
- tecAMM_BALANCE: AMM has invalid balance. Calculated balances greater than the current pool balances.
- tecAMM_FAILED: AMM transaction failed. Fails due to a processing failure.
- tecAMM_INVALID_TOKENS: AMM invalid LP tokens. Invalid input values, format, or calculated values.
- tecAMM_EMPTY: AMM is in empty state. Transaction expects AMM in non-empty state (LP tokens > 0).
- tecAMM_NOT_EMPTY: AMM is not in empty state. Transaction expects AMM in empty state (LP tokens == 0).
- tecAMM_EMPTY: AMM is in empty state. Transaction requires AMM in non-empty state (LP tokens > 0).
- tecAMM_NOT_EMPTY: AMM is not in empty state. Transaction requires AMM in empty state (LP tokens == 0).
- tecAMM_ACCOUNT: AMM account. Clawback of AMM account.
- tecINCOMPLETE: Some work was completed, but more submissions required to finish. AMMDelete partially deletes the trustlines.
## XRP Ledger version 1.11.0
## XRP Ledger server version 1.11.0
[Version 1.11.0](https://github.com/XRPLF/rippled/releases/tag/1.11.0) was released on Jun 20, 2023.
### Breaking changes in 1.11
- Added the ability to mark amendments as obsolete. For the `feature` admin API, there is a new possible value for the `vetoed` field. (https://github.com/XRPLF/rippled/pull/4291)
- Added the ability to mark amendments as obsolete. For the `feature` admin API, there is a new possible value for the `vetoed` field. ([#4291](https://github.com/XRPLF/rippled/pull/4291))
- The value of `vetoed` can now be `true`, `false`, or `"Obsolete"`.
- Removed the acceptance of seeds or public keys in place of account addresses. (https://github.com/XRPLF/rippled/pull/4404)
- Removed the acceptance of seeds or public keys in place of account addresses. ([#4404](https://github.com/XRPLF/rippled/pull/4404))
- This simplifies the API and encourages better security practices (i.e. seeds should never be sent over the network).
- For the `ledger_data` method, when all entries are filtered out, the `state` field of the response is now an empty list (in other words, an empty array, `[]`). (Previously, it would return `null`.) While this is technically a breaking change, the new behavior is consistent with the documentation, so this is considered only a bug fix. (https://github.com/XRPLF/rippled/pull/4398)
- For the `ledger_data` method, when all entries are filtered out, the `state` field of the response is now an empty list (in other words, an empty array, `[]`). (Previously, it would return `null`.) While this is technically a breaking change, the new behavior is consistent with the documentation, so this is considered only a bug fix. ([#4398](https://github.com/XRPLF/rippled/pull/4398))
- If and when the `fixNFTokenRemint` amendment activates, there will be a new AccountRoot field, `FirstNFTSequence`. This field is set to the current account sequence when the account issues their first NFT. If an account has not issued any NFTs, then the field is not set. ([#4406](https://github.com/XRPLF/rippled/pull/4406))
- There is a new account deletion restriction: an account can only be deleted if `FirstNFTSequence` + `MintedNFTokens` + `256` is less than the current ledger sequence.
- This is potentially a breaking change if clients have logic for determining whether an account can be deleted.
- NetworkID
- For sidechains and networks with a network ID greater than 1024, there is a new [transaction common field](https://xrpl.org/transaction-common-fields.html), `NetworkID`. (https://github.com/XRPLF/rippled/pull/4370)
- For sidechains and networks with a network ID greater than 1024, there is a new [transaction common field](https://xrpl.org/transaction-common-fields.html), `NetworkID`. ([#4370](https://github.com/XRPLF/rippled/pull/4370))
- This field helps to prevent replay attacks and is now required for chains whose network ID is 1025 or higher.
- The field must be omitted for Mainnet, so there is no change for Mainnet users.
- There are three new local error codes:
@@ -102,12 +237,12 @@ Additions are intended to be non-breaking (because they are purely additive).
### Additions and bug fixes in 1.11
- Added `nftoken_id`, `nftoken_ids` and `offer_id` meta fields into NFT `tx` and `account_tx` responses. (https://github.com/XRPLF/rippled/pull/4447)
- Added an `account_flags` object to the `account_info` method response. (https://github.com/XRPLF/rippled/pull/4459)
- Added `NFTokenPages` to the `account_objects` RPC. (https://github.com/XRPLF/rippled/pull/4352)
- Fixed: `marker` returned from the `account_lines` command would not work on subsequent commands. (https://github.com/XRPLF/rippled/pull/4361)
- Added `nftoken_id`, `nftoken_ids` and `offer_id` meta fields into NFT `tx` and `account_tx` responses. ([#4447](https://github.com/XRPLF/rippled/pull/4447))
- Added an `account_flags` object to the `account_info` method response. ([#4459](https://github.com/XRPLF/rippled/pull/4459))
- Added `NFTokenPages` to the `account_objects` RPC. ([#4352](https://github.com/XRPLF/rippled/pull/4352))
- Fixed: `marker` returned from the `account_lines` command would not work on subsequent commands. ([#4361](https://github.com/XRPLF/rippled/pull/4361))
@@ -118,70 +253,6 @@ was released on Mar 14, 2023.
removed from the [ledger subscription stream](https://xrpl.org/subscribe.html#ledger-stream), because it will no longer
have any meaning.
## API Version 2
API version 2 is introduced in `rippled` version 2.0. Users can request it explicitly by specifying `"api_version" : 2`.
#### Removed methods
In API version 2, the following methods are no longer available: (https://github.com/XRPLF/rippled/pull/4759)
-`tx_history` - Instead, use other methods such as `account_tx` or `ledger` with the `transactions` field set to `true`.
-`ledger_header` - Instead, use the `ledger` method.
#### Modifications to JSON transaction element in V2
In API version 2, JSON elements for transaction output have been changed and made consistent for all methods which output transactions: (https://github.com/XRPLF/rippled/pull/4775)
- JSON transaction element is named `tx_json`
- Binary transaction element is named `tx_blob`
- JSON transaction metadata element is named `meta`
- Binary transaction metadata element is named `meta_blob`
Additionally, these elements are now consistently available next to `tx_json` (i.e. sibling elements), where possible:
-`hash` - Transaction ID. This data was stored inside transaction output in API version 1, but in API version 2 is a sibling element.
-`ledger_index` - Ledger index (only set on validated ledgers)
-`ledger_hash` - Ledger hash (only set on closed or validated ledgers)
-`close_time_iso` - Ledger close time expressed in ISO 8601 time format (only set on validated ledgers)
-`validated` - Bool element set to `true` if the transaction is in a validated ledger, otherwise `false`
This change affects the following methods:
-`tx` - Transaction data moved into element `tx_json` (was inline inside `result`) or, if binary output was requested, moved from `tx` to `tx_blob`. Renamed binary transaction metadata element (if it was requested) from `meta` to `meta_blob`. Changed location of `hash` and added new elements
-`account_tx` - Renamed transaction element from `tx` to `tx_json`. Renamed binary transaction metadata element (if it was requested) from `meta` to `meta_blob`. Changed location of `hash` and added new elements
-`transaction_entry` - Renamed transaction metadata element from `metadata` to `meta`. Changed location of `hash` and added new elements
-`subscribe` - Renamed transaction element from `transaction` to `tx_json`. Changed location of `hash` and added new elements
-`sign`, `sign_for`, `submit` and `submit_multisigned` - Changed location of `hash` element.
#### Modification to `Payment` transaction JSON schema
- In `Payment` transaction type, JSON RPC field `Amount` is renamed to `DeliverMax`. To enable smooth client transition, `Amount` is still handled, as described below: (https://github.com/XRPLF/rippled/pull/4733)
- On JSON RPC input (e.g. `submit_multisigned` etc. methods), `Amount` is recognized as an alias to `DeliverMax` for both API version 1 and version 2 clients.
- On JSON RPC input, submitting both `Amount` and `DeliverMax` fields is allowed _only_ if they are identical; otherwise such input is rejected with `rpcINVALID_PARAMS` error.
- On JSON RPC output (e.g. `subscribe`, `account_tx` etc. methods), `DeliverMax` is present in both API version 1 and version 2.
- On JSON RPC output, `Amount` is only present in API version 1 and _not_ in version 2.
#### Modifications to account_info response
-`signer_lists` is returned in the root of the response. In API version 1, it was nested under `account_data`. (https://github.com/XRPLF/rippled/pull/3770)
- When using an invalid `signer_lists` value, the API now returns an "invalidParams" error. (https://github.com/XRPLF/rippled/pull/4585)
- (`signer_lists` must be a boolean. In API version 1, strings are accepted and may return a normal response - as if `signer_lists` were `true`.)
#### Modifications to [account_tx](https://xrpl.org/account_tx.html#account_tx) response
- Using `ledger_index_min`, `ledger_index_max`, and `ledger_index` returns `invalidParams` because if you use `ledger_index_min` or `ledger_index_max`, then it does not make sense to also specify `ledger_index`. In API version 1, no error was returned. (https://github.com/XRPLF/rippled/pull/4571)
- The same applies for `ledger_index_min`, `ledger_index_max`, and `ledger_hash`. (https://github.com/XRPLF/rippled/issues/4545#issuecomment-1565065579)
- Using a `ledger_index_min` or `ledger_index_max` beyond the range of ledgers that the server has:
- returns `lgrIdxMalformed` in API version 2. (https://github.com/XRPLF/rippled/issues/4288)
- In API version 1, no error was returned.
- Attempting to use a non-boolean value (such as a string) for the `binary` or `forward` parameters returns `invalidParams` (`rpcINVALID_PARAMS`). In API version 1, no error was returned. (<https://github.com/XRPLF/rippled/pull/4620>)
#### Modifications to [noripple_check](https://xrpl.org/noripple_check.html#noripple_check) response
- Attempting to use a non-boolean value (such as a string) for the `transactions` parameter returns `invalidParams` (`rpcINVALID_PARAMS`). In API version 1, no error was returned. (<https://github.com/XRPLF/rippled/pull/4620>)
# Unit tests for API changes
The following information is useful to developers contributing to this project:
API version 2 is available in `xrpld` version 2.0.0 and later. To use this API, clients specify `"api_version" : 2` in each request.
For info about how [API versioning](https://xrpl.org/request-formatting.html#api-versioning) works, including examples, please view the [XLS-22d spec](https://github.com/XRPLF/XRPL-Standards/discussions/54). For details about the implementation of API versioning, view the [implementation PR](https://github.com/XRPLF/rippled/pull/3155). API versioning ensures existing integrations and users continue to receive existing behavior, while those that request a higher API version will experience new behavior.
## Removed methods
In API version 2, the following deprecated methods are no longer available: ([#4759](https://github.com/XRPLF/rippled/pull/4759))
-`tx_history` - Instead, use other methods such as `account_tx` or `ledger` with the `transactions` field set to `true`.
-`ledger_header` - Instead, use the `ledger` method.
## Modifications to JSON transaction element in API version 2
In API version 2, JSON elements for transaction output have been changed and made consistent for all methods which output transactions. ([#4775](https://github.com/XRPLF/rippled/pull/4775))
This helps to unify the JSON serialization format of transactions. ([clio#722](https://github.com/XRPLF/clio/issues/722), [#4727](https://github.com/XRPLF/rippled/issues/4727))
- JSON transaction element is named `tx_json`
- Binary transaction element is named `tx_blob`
- JSON transaction metadata element is named `meta`
- Binary transaction metadata element is named `meta_blob`
Additionally, these elements are now consistently available next to `tx_json` (i.e. sibling elements), where possible:
-`hash` - Transaction ID. This data was stored inside transaction output in API version 1, but in API version 2 is a sibling element.
-`ledger_index` - Ledger index (only set on validated ledgers)
-`ledger_hash` - Ledger hash (only set on closed or validated ledgers)
-`close_time_iso` - Ledger close time expressed in ISO 8601 time format (only set on validated ledgers)
-`validated` - Bool element set to `true` if the transaction is in a validated ledger, otherwise `false`
This change affects the following methods:
-`tx` - Transaction data moved into element `tx_json` (was inline inside `result`) or, if binary output was requested, moved from `tx` to `tx_blob`. Renamed binary transaction metadata element (if it was requested) from `meta` to `meta_blob`. Changed location of `hash` and added new elements
-`account_tx` - Renamed transaction element from `tx` to `tx_json`. Renamed binary transaction metadata element (if it was requested) from `meta` to `meta_blob`. Changed location of `hash` and added new elements
-`transaction_entry` - Renamed transaction metadata element from `metadata` to `meta`. Changed location of `hash` and added new elements
-`subscribe` - Renamed transaction element from `transaction` to `tx_json`. Changed location of `hash` and added new elements
-`sign`, `sign_for`, `submit` and `submit_multisigned` - Changed location of `hash` element.
## Modifications to `Payment` transaction JSON schema
When reading Payments, the `Amount` field should generally **not** be used. Instead, use [delivered_amount](https://xrpl.org/partial-payments.html#the-delivered_amount-field) to see the amount that the Payment delivered. To clarify its meaning, the `Amount` field is being renamed to `DeliverMax`. ([#4733](https://github.com/XRPLF/rippled/pull/4733))
- In `Payment` transaction type, JSON RPC field `Amount` is renamed to `DeliverMax`. To enable smooth client transition, `Amount` is still handled, as described below: ([#4733](https://github.com/XRPLF/rippled/pull/4733))
- On JSON RPC input (e.g. `submit_multisigned` etc. methods), `Amount` is recognized as an alias to `DeliverMax` for both API version 1 and version 2 clients.
- On JSON RPC input, submitting both `Amount` and `DeliverMax` fields is allowed _only_ if they are identical; otherwise such input is rejected with `rpcINVALID_PARAMS` error.
- On JSON RPC output (e.g. `subscribe`, `account_tx` etc. methods), `DeliverMax` is present in both API version 1 and version 2.
- On JSON RPC output, `Amount` is only present in API version 1 and _not_ in version 2.
## Modifications to account_info response
-`signer_lists` is returned in the root of the response. (In API version 1, it was nested under `account_data`.) ([#3770](https://github.com/XRPLF/rippled/pull/3770))
- When using an invalid `signer_lists` value, the API now returns an "invalidParams" error. ([#4585](https://github.com/XRPLF/rippled/pull/4585))
- (`signer_lists` must be a boolean. In API version 1, strings were accepted and may return a normal response - i.e. as if `signer_lists` were `true`.)
## Modifications to [account_tx](https://xrpl.org/account_tx.html#account_tx) response
- Using `ledger_index_min`, `ledger_index_max`, and `ledger_index` returns `invalidParams` because if you use `ledger_index_min` or `ledger_index_max`, then it does not make sense to also specify `ledger_index`. In API version 1, no error was returned. ([#4571](https://github.com/XRPLF/rippled/pull/4571))
- The same applies for `ledger_index_min`, `ledger_index_max`, and `ledger_hash`. ([#4545](https://github.com/XRPLF/rippled/issues/4545#issuecomment-1565065579))
- Using a `ledger_index_min` or `ledger_index_max` beyond the range of ledgers that the server has:
- returns `lgrIdxMalformed` in API version 2. Previously, in API version 1, no error was returned. ([#4288](https://github.com/XRPLF/rippled/issues/4288))
- Attempting to use a non-boolean value (such as a string) for the `binary` or `forward` parameters returns `invalidParams` (`rpcINVALID_PARAMS`). Previously, in API version 1, no error was returned. ([#4620](https://github.com/XRPLF/rippled/pull/4620))
## Modifications to [noripple_check](https://xrpl.org/noripple_check.html#noripple_check) response
- Attempting to use a non-boolean value (such as a string) for the `transactions` parameter returns `invalidParams` (`rpcINVALID_PARAMS`). Previously, in API version 1, no error was returned. ([#4620](https://github.com/XRPLF/rippled/pull/4620))
API version 3 is currently a **beta API**. It requires enabling `[beta_rpc_api]` in the xrpld configuration to use. To use this API, clients specify `"api_version" : 3` in each request.
For info about how [API versioning](https://xrpl.org/request-formatting.html#api-versioning) works, including examples, please view the [XLS-22d spec](https://github.com/XRPLF/XRPL-Standards/discussions/54). For details about the implementation of API versioning, view the [implementation PR](https://github.com/XRPLF/rippled/pull/3155). API versioning ensures existing integrations and users continue to receive existing behavior, while those that request a higher API version will experience new behavior.
## Breaking Changes
### Modifications to `amm_info`
The order of error checks has been changed to provide more specific error messages. ([#4924](https://github.com/XRPLF/rippled/pull/4924))
- **Before (API v2)**: When sending an invalid account or asset to `amm_info` while other parameters are not set as expected, the method returns a generic `rpcINVALID_PARAMS` error.
- **After (API v3)**: The same scenario returns a more specific error: `rpcISSUE_MALFORMED` for malformed assets or `rpcACT_MALFORMED` for malformed accounts.
### Modifications to `ledger_entry`
Added support for string shortcuts to look up fixed-location ledger entries using the `"index"` parameter. ([#5644](https://github.com/XRPLF/rippled/pull/5644))
In API version 3, the following string values can be used with the `"index"` parameter:
-`"index": "amendments"` - Returns the `Amendments` ledger entry
-`"index": "fee"` - Returns the `FeeSettings` ledger entry
-`"index": "nunl"` - Returns the `NegativeUNL` ledger entry
| These instructions assume you have a C++ development environment ready with Git, Python, Conan, CMake, and a C++ compiler. For help setting one up on Linux, macOS, or Windows, [see this guide](./docs/build/environment.md). |
> These instructions also assume a basic familiarity with Conan and CMake.
> If you are unfamiliar with Conan,
> you can read our [crash course](./docs/build/conan.md)
> or the official [Getting Started][3] walkthrough.
> If you are unfamiliar with Conan, you can read our
> [crash course](./docs/build/conan.md) or the official [Getting Started][3]
> walkthrough.
## Branches
For a stable release, choose the `master` branch or one of the [tagged
For the latest release candidate, choose the `release` branch.
```
```bash
git checkout release
```
For the latest set of untested features, or to contribute, choose the `develop`
branch.
```
```bash
git checkout develop
```
@@ -33,126 +33,329 @@ git checkout develop
See [System Requirements](https://xrpl.org/system-requirements.html).
Building rippled generally requires git, Python, Conan, CMake, and a C++ compiler. Some guidance on setting up such a [C++ development environment can be found here](./docs/build/environment.md).
Building xrpld generally requires git, Python, Conan, CMake, and a C++
compiler. Some guidance on setting up such a [C++ development environment can be
found here](./docs/build/environment.md).
- [Python 3.7](https://www.python.org/downloads/)
- [Conan 1.55](https://conan.io/downloads.html)
- [CMake 3.16](https://cmake.org/download/)
- [Python 3.11](https://www.python.org/downloads/), or higher
- [Conan 2.17](https://conan.io/downloads.html)[^1], or higher
- [CMake 3.22](https://cmake.org/download/), or higher
`rippled` is written in the C++20 dialect and includes the `<concepts>` header.
[^1]:
It is possible to build with Conan 1.60+, but the instructions are
significantly different, which is why we are not recommending it.
`xrpld` is written in the C++20 dialect and includes the `<concepts>` header.
The [minimum compiler versions][2] required are:
| Compiler | Version |
|-------------|---------|
| GCC | 11 |
| Clang | 13 |
| Apple Clang | 13.1.6 |
| MSVC | 19.23 |
| Compiler | Version |
|----------- | ---------|
| GCC | 12 |
| Clang | 16 |
| Apple Clang | 16 |
| MSVC | 19.44[^3] |
### Linux
The Ubuntu operating system has received the highest level of
quality assurance, testing, and support.
The Ubuntu Linux distribution has received the highest level of quality
assurance, testing, and support. We also support Red Hat and use Debian
internally.
Here are [sample instructions for setting up a C++ development environment on Linux](./docs/build/environment.md#linux).
Here are [sample instructions for setting up a C++ development environment on
Linux](./docs/build/environment.md#linux).
### Mac
Many rippled engineers use macOS for development.
Many xrpld engineers use macOS for development.
Here are [sample instructions for setting up a C++ development environment on macOS](./docs/build/environment.md#macos).
Here are [sample instructions for setting up a C++ development environment on
macOS](./docs/build/environment.md#macos).
### Windows
Windows is not recommended for production use at this time.
Windows is used by some engineers for development only.
- Additionally, 32-bit Windows development is not supported.
- Visual Studio 2022 is not yet supported.
- rippled generally requires [Boost][] 1.77, which Conan cannot build with VS 2022.
- Until rippled is updated for compatibility with later versions of Boost, Windows developers may need to use Visual Studio 2019.
[Boost]: https://www.boost.org/
[^3]: Windows is not recommended for production use.
## Steps
### Set Up Conan
After you have a [C++ development environment](./docs/build/environment.md) ready with Git, Python, Conan, CMake, and a C++ compiler, you may need to set up your Conan profile.
After you have a [C++ development environment](./docs/build/environment.md) ready with Git, Python,
Conan, CMake, and a C++ compiler, you may need to set up your Conan profile.
These instructions assume a basic familiarity with Conan and CMake.
These instructions assume a basic familiarity with Conan and CMake. If you are
unfamiliar with Conan, then please read [this crash course](./docs/build/conan.md) or the official
[Getting Started][3] walkthrough.
If you are unfamiliar with Conan, then please read [this crash course](./docs/build/conan.md) or the official [Getting Started][3] walkthrough.
#### Conan lockfile
You'll need at least one Conan profile:
To achieve reproducible dependencies, we use a [Conan lockfile](https://docs.conan.io/2/tutorial/versioning/lockfiles.html),
which has to be updated every time dependencies change.
```
conan profile new default --detect
```
Please see the [instructions on how to regenerate the lockfile](conan/lockfile/README.md).
You can manage multiple Conan profiles in the directory
`$(conan config home)/profiles`, for example renaming `default` to a different
name and then creating a new `default` profile for a different compiler.
#### Select language
The default profile created by Conan will typically select different C++ dialect
than C++20 used by this project. You should set `20` in the profile line
starting with `compiler.cppstd=`. For example:
```bash
sed -i.bak -e 's|^compiler\.cppstd=.*$|compiler.cppstd=20|'$(conan config home)/profiles/default
```
#### Select standard library in Linux
**Linux** developers will commonly have a default Conan [profile][] that
compiles with GCC and links with libstdc++. If you are linking with libstdc++
(see profile setting `compiler.libcxx`), then you will need to choose the
`libstdc++11` ABI:
```bash
sed -i.bak -e 's|^compiler\.libcxx=.*$|compiler.libcxx=libstdc++11|'$(conan config home)/profiles/default
```
#### Select architecture and runtime in Windows
**Windows** developers may need to use the x64 native build tools. An easy way
to do that is to run the shortcut "x64 Native Tools Command Prompt" for the
version of Visual Studio that you have installed.
Windows developers must also build `xrpld` and its dependencies for the x64
architecture:
```bash
sed -i.bak -e 's|^arch=.*$|arch=x86_64|'$(conan config home)/profiles/default
```
**Windows** developers also must select static runtime:
```bash
sed -i.bak -e 's|^compiler\.runtime=.*$|compiler.runtime=static|'$(conan config home)/profiles/default
```
#### Clang workaround for grpc
If your compiler is clang, version 19 or later, or apple-clang, version 17 or
later, you may encounter a compilation error while building the `grpc`
dependency:
```text
In file included from .../lib/promise/try_seq.h:26:
.../lib/promise/detail/basic_seq.h:499:38: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
If your compiler is gcc, version 12, and you have enabled `werr` option, you may
encounter a compilation error such as:
```text
/usr/include/c++/12/bits/char_traits.h:435:56: error: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775807] and 1 may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
You can then build and test as usual, with the generated `xrpld` binary containing the sanitizer instrumentation. When you run it, it will report any sanitizer errors it detects in the console output.
See [Sanitizers docs](./docs/build/sanitizers.md) for more details.
## Options
| Option | Default Value | Description |
| --- | ---| ---|
| `assert` | OFF | Enable assertions.
| `reporting` | OFF | Build the reporting mode feature. |
| `tests` | ON | Build tests. |
| `unity` | ON | Configure a unity build. |
| `san` | N/A | Enable a sanitizer with Clang. Choices are `thread` and `address`. |
[Unity builds][5] may be faster for the first build
(at the cost of much more memory) since they concatenate sources into fewer
translation units. Non-unity builds may be faster for incremental builds,
and can be helpful for detecting `#include` omissions.
| `coverage` | OFF | Prepare the coverage report. |
| `tests` | OFF | Build tests. |
| `unity` | OFF | Configure a unity build. |
| `xrpld` | OFF | Build the xrpld application, and not just the libxrpl library. |
| `werr` | OFF | Treat compilation warnings as errors |
| `wextra` | OFF | Enable additional compilation warnings |
[Unity builds][5] may be faster for the first build (at the cost of much more
memory) since they concatenate sources into fewer translation units. Non-unity
builds may be faster for incremental builds, and can be helpful for detecting
`#include` omissions.
## Troubleshooting
### Conan
After any updates or changes to dependencies, you may need to do the following:
1. Remove your build directory.
2. Remove the Conan cache:
2. Remove individual libraries from the Conan cache, e.g.
```bash
conan remove 'grpc/*'
```
rm -rf ~/.conan/data
**or**
Remove all libraries from Conan cache:
```bash
conan remove '*'
```
4. Re-run [conan install](#build-and-test).
3. Re-run [conan export](#patched-recipes) if needed.
4. [Regenerate lockfile](#conan-lockfile).
5. Re-run [conan install](#build-and-test).
### no std::result_of
#### ERROR: Package not resolved
If your compiler version is recent enough to have removed `std::result_of` as
part of C++20, e.g. Apple Clang 15.0, then you might need to add a preprocessor
definition to your build.
If you're seeing an error like `ERROR: Package 'snappy/1.1.10' not resolved: Unable to find 'snappy/1.1.10#968fef506ff261592ec30c574d4a7809%1756234314.246' in remotes.`,
please add `xrplf` remote or re-run `conan export` for [patched recipes](#patched-recipes).
### `protobuf/port_def.inc` file not found
If `cmake --build .` results in an error due to a missing a protobuf file, then
you might have generated CMake files for a different `build_type` than the
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.