From 308f6c5375353824acba302593a618c27a7b13f2 Mon Sep 17 00:00:00 2001 From: Denis Angell Date: Wed, 13 May 2026 20:57:25 +0200 Subject: [PATCH] regen skills --- .../doc-agent/prompts/document-file.md | 229 +++++++++++- .../scripts/doc-agent/prompts/regen-skill.md | 30 +- .github/scripts/doc-agent/src/regen-skills.ts | 22 +- SCOPE_OF_WORK.md | 331 +++++++++++------- docs/skills/consensus.md | 38 +- docs/skills/cryptography.md | 82 ++--- docs/skills/ledger.md | 198 ++++++++--- docs/skills/nodestore.md | 106 +++--- docs/skills/peering.md | 97 +++-- docs/skills/protocol.md | 133 +++++-- docs/skills/rpc.md | 44 ++- docs/skills/shamap.md | 56 ++- docs/skills/sql.md | 143 +++++--- docs/skills/transactors.md | 189 +++++++--- 14 files changed, 1215 insertions(+), 483 deletions(-) diff --git a/.github/scripts/doc-agent/prompts/document-file.md b/.github/scripts/doc-agent/prompts/document-file.md index ea78c76c87..630bd55d90 100644 --- a/.github/scripts/doc-agent/prompts/document-file.md +++ b/.github/scripts/doc-agent/prompts/document-file.md @@ -65,22 +65,219 @@ classes, and gotchas: ## Process -1. If "Authoritative AI Context" is provided in the user prompt, treat it as - the source of truth for the file's intent and behavior. Your task is to - translate that prose into structured Doxygen comments on the declarations. -2. Read the target file completely -3. Read the corresponding skill file in `docs/skills/` if one applies -4. Identify entities that need documentation (public classes, structs, - public methods, free functions in headers, enums) -5. For each entity: cross-reference the ai.md context, read the implementation - (and tests if helpful), then write a Doxygen comment that captures behavior - and intent -6. Use the Edit tool to add the comments to the file -7. Do NOT modify code logic — only add documentation -8. Do NOT add documentation to entities that don't need it (private members - with obvious purpose, simple getters where the name is self-explanatory) -9. Do NOT read the `.ai.md` file yourself — it is already injected into your - prompt when one exists for the target file +Documenting a declaration is not the same as "writing a doxygen comment +above it". It is producing the **total** set of comments that should +surround the declaration after this pass — which includes the docstring +and any inline comments that remain inside the function body or next to +a data-literal initializer. Existing comments in the file are inputs, +not outputs you are preserving. + +For each entity (class, struct, public method, free function in a header, +enum, public field): + +1. **Read** the declaration, its full implementation, and **every comment + that is currently attached to it** — the Doxygen above it, any `//!` + line, any inline `// ...` annotations next to its initializer or + inside its body. Treat all of these as raw information about intent. +2. **Cross-reference** the ai.md context (already injected in your + prompt) and the module skill file. Also grep for the entity's name + to find callers and tests where the behavioral contract is exercised + — those are often the best source of what to write. +3. **Decide what the reader needs**, in this order: + a. A docstring that captures behavior, contract, invariants, and the + WHY. This is the primary deliverable. + b. Inline comments **only** where they document something the + docstring cannot reasonably hold — typically a non-obvious local + invariant, a workaround for a specific bug, a tricky branch whose + WHY is genuinely local. If the inline comment just narrates what + the next line does, it does not belong. +4. **Produce a single edit** that replaces the entity's full comment + surface with the result of step 3. Concretely: + - If you wrote a docstring whose contents subsume an existing `//!` + or section-header prose comment, **remove** the old comment as part + of the same edit. Do not leave both. + - If you wrote a docstring whose `@note` or body covers the meaning + of an inline annotation on a map row, array literal, or magic + constant inside the entity, **remove** that inline annotation. + Leaving it duplicates what the docstring says. + - If you wrote a docstring on a function whose body has line-by-line + narration of control flow (`// check this`, `// now do that`), + **remove** the narration unless a specific line documents a real, + non-obvious WHY. + - Section banner comments (`// --- Avalanche tuning ---`) may stay as + short visual dividers if they help scanning a long struct, but any + multi-line prose in them that is now in the per-field Doxygen + should be cut. +5. **Do not delete** comments that capture a WHY the docstring does not + cover: a workaround for a real bug, a non-obvious invariant local to + one branch, a reference to a ticket or RFC. If a pre-existing + comment contains information you did not put in the new docstring, + either fold it into the docstring or leave it in place. + +## Worked examples + +These show the exact transformations expected. The "AFTER" column is the +state the file must be in when you finish. If your edit leaves the file +in the "BEFORE" state, the pass has failed. + +### Example 1: section-header prose → short banner + +BEFORE: +```cpp +//------------------------------------------------------------------------- +// Validation and proposal durations are relative to NetClock times, so use +// second resolution + +/** Maximum age of a validation relative to its ledger's close time. + * ... (rest of docstring already explains NetClock semantics) ... + */ +std::chrono::seconds const validationVALID_WALL = std::chrono::minutes{5}; +``` + +AFTER: +```cpp +// --- NetClock-domain parameters --- + +/** Maximum age of a validation relative to its ledger's close time. + * ... (rest of docstring already explains NetClock semantics) ... + */ +std::chrono::seconds const validationVALID_WALL = std::chrono::minutes{5}; +``` + +The multi-line prose was redundant with the new per-field Doxygen and the +file-level `@file` block. Replace with a single-line banner. + +### Example 2: inline annotations on a data literal → removed + +BEFORE: +```cpp +/** Avalanche state machine cutoffs. + * + * | State | Time | Yes-vote | Next | + * |--------|------|----------|--------| + * | Init | 0 | 50 | Mid | + * | Mid | 50 | 65 | Late | + * ... + */ +std::map const avalancheCutoffs{ + // {state, {time, percent, nextState}}, + // Initial state: 50% of nodes must vote yes + {AvalancheState::Init, {.consensusTime = 0, .consensusPct = 50, .next = AvalancheState::Mid}}, + // mid-consensus starts after 50% of the previous round time, and + // requires 65% yes + {AvalancheState::Mid, {.consensusTime = 50, .consensusPct = 65, .next = AvalancheState::Late}}, + // ... +}; +``` + +AFTER: +```cpp +/** Avalanche state machine cutoffs. + * + * | State | Time | Yes-vote | Next | + * |--------|------|----------|--------| + * | Init | 0 | 50 | Mid | + * | Mid | 50 | 65 | Late | + * ... + */ +std::map const avalancheCutoffs{ + {AvalancheState::Init, {.consensusTime = 0, .consensusPct = 50, .next = AvalancheState::Mid}}, + {AvalancheState::Mid, {.consensusTime = 50, .consensusPct = 65, .next = AvalancheState::Late}}, + // ... +}; +``` + +The per-row inline comments restate the table that is now in the +docstring above. They go. The schema comment `// {state, {time, percent, ...}}` +also goes — the designated-initializer field names make the schema obvious. + +### Example 3: body narration in a documented function → removed + +BEFORE: +```cpp +/** Query the avalanche state machine. + * ... + * @note `at()` calls on `avalancheCutoffs` are safe because the map is + * constructed with all four valid keys. + */ +inline std::pair<...> getNeededWeight(...) +{ + // at() can throw, but the map is built by hand to ensure all valid + // values are available. + auto const& currentCutoff = p.avalancheCutoffs.at(currentState); + // Should we consider moving to the next state? + if (currentCutoff.next != currentState && currentRounds >= minimumRounds) + { + // at() can throw, but the map is built by hand to ensure all + // valid values are available. + auto const& nextCutoff = p.avalancheCutoffs.at(currentCutoff.next); + // See if enough time has passed to move on to the next. + XRPL_ASSERT(...); + if (percentTime >= nextCutoff.consensusTime) + { + return {nextCutoff.consensusPct, currentCutoff.next}; + } + } + return {currentCutoff.consensusPct, {}}; +} +``` + +AFTER: +```cpp +/** Query the avalanche state machine. + * ... + * @note `at()` calls on `avalancheCutoffs` are safe because the map is + * constructed with all four valid keys. + */ +inline std::pair<...> getNeededWeight(...) +{ + auto const& currentCutoff = p.avalancheCutoffs.at(currentState); + if (currentCutoff.next != currentState && currentRounds >= minimumRounds) + { + auto const& nextCutoff = p.avalancheCutoffs.at(currentCutoff.next); + XRPL_ASSERT(...); + if (percentTime >= nextCutoff.consensusTime) + { + return {nextCutoff.consensusPct, currentCutoff.next}; + } + } + return {currentCutoff.consensusPct, {}}; +} +``` + +Every removed comment was either restating what the next line does +(`// Should we consider moving to the next state?` on an `if`) or +duplicating the docstring's `@note` (`// at() can throw...`). None of +them documented a non-obvious WHY local to that line. + +### Calibration: when an inline comment STAYS + +If the body contains a comment that documents a real local WHY — +something the function-level docstring cannot reasonably hold — keep it. + +```cpp +// Workaround for boost #12345: pass nullptr instead of the empty buffer. +boost::asio::buffer(nullptr, 0); + +// We deliberately do not lock here: the caller is required to hold +// lock_ across this method and the recursion would deadlock. +internalUpdate(); +``` + +These are non-removable. They are not restating the code; they are +explaining something the reader cannot derive from the line. + +## Rules that apply throughout + +- Do NOT modify code logic — only adjust comments and Doxygen. +- Do NOT document entities that don't need it (private members with + obvious purpose, trivial defaulted constructors, getters whose name is + self-explanatory). +- Do NOT read the `.ai.md` file yourself — it is already in your prompt + if one exists for this file. +- If "Authoritative AI Context" is provided in the user prompt, treat it + as the source of truth for the file's intent and behavior. Your task + is to translate that prose into Doxygen on the actual declarations. When you finish, summarize: - How many entities you documented diff --git a/.github/scripts/doc-agent/prompts/regen-skill.md b/.github/scripts/doc-agent/prompts/regen-skill.md index 7194f181c8..4955407f1b 100644 --- a/.github/scripts/doc-agent/prompts/regen-skill.md +++ b/.github/scripts/doc-agent/prompts/regen-skill.md @@ -39,9 +39,29 @@ ai.md files into the existing skill. Specifically: - **Keep prose grounded.** No marketing language. No "robust, scalable, enterprise-grade" filler. Engineers reading this need facts. -## Output +## Output — Chunked Writing (REQUIRED) -Emit the complete new skill file content as your final assistant message. -Start with the markdown heading. Do not include meta-commentary like "Here -is the updated skill file" — the output is captured verbatim and written -to the skill file path. +You have a per-turn output cap (32K tokens). For larger modules, a +complete skill file will not fit in a single tool call. You MUST write +the file in chunks across multiple tool calls. Do not try to emit the +whole file in one Write — it will be truncated mid-content. + +Process: +1. **First chunk (Write)**: Call the `Write` tool with the start of the + skill: the title heading, the opening overview, and the first 1–2 + major sections. Keep this chunk under ~20K characters of content. +2. **Subsequent chunks (Edit)**: For each remaining section, call the + `Edit` tool with: + - `old_string` = the last line currently at the end of the file (must + be unique enough to match unambiguously — use the full last line) + - `new_string` = that same last line **plus the next 1–2 sections** + appended + Keep each chunk under ~20K characters. +3. **Repeat** until the skill is complete. There is no maximum number + of Edit calls. + +After the file is fully written, respond with a one-line confirmation +listing how many chunks you wrote. + +DO NOT emit the skill content in your text response. The file is the +output; the text response is only for confirmation. diff --git a/.github/scripts/doc-agent/src/regen-skills.ts b/.github/scripts/doc-agent/src/regen-skills.ts index 30e5493098..3310a72180 100644 --- a/.github/scripts/doc-agent/src/regen-skills.ts +++ b/.github/scripts/doc-agent/src/regen-skills.ts @@ -122,22 +122,30 @@ When you have written the file, respond with a brief one-line confirmation.`; model: MODEL, systemPrompt, cwd: XRPLD_ROOT, - allowedTools: ['Write', 'Read', 'Glob', 'Grep'], + allowedTools: ['Write', 'Edit', 'Read', 'Glob', 'Grep'], permissionMode: 'acceptEdits', }, }); - let wroteFile = false; + let writeCount = 0; + let editCount = 0; for await (const message of result) { if (message.type === 'assistant') { const content = message.message?.content; if (Array.isArray(content)) { for (const block of content) { if (block.type === 'tool_use' && block.name === 'Write') { + writeCount++; const input = block.input as { file_path?: string } | undefined; if (input?.file_path !== undefined) { - wroteFile = true; - console.log(` Agent wrote: ${input.file_path}`); + console.log(` Write: ${input.file_path}`); + } + } + if (block.type === 'tool_use' && block.name === 'Edit') { + editCount++; + const input = block.input as { file_path?: string } | undefined; + if (input?.file_path !== undefined) { + console.log(` Edit: ${input.file_path}`); } } } @@ -145,14 +153,14 @@ When you have written the file, respond with a brief one-line confirmation.`; } if (message.type === 'result') { const cost = message.total_cost_usd?.toFixed(4) ?? '?'; - console.log(` [Cost: $${cost}]`); + console.log(` [Cost: $${cost}]`); } } - if (!wroteFile) { + if (writeCount === 0) { console.error(' Agent did not call Write — skill file not updated.'); return; } - console.log(` Wrote: ${skillRelPath}`); + console.log(` Wrote: ${skillRelPath} (${writeCount} Write + ${editCount} Edit calls)`); } diff --git a/SCOPE_OF_WORK.md b/SCOPE_OF_WORK.md index 63212a07f8..d20e3480ef 100644 --- a/SCOPE_OF_WORK.md +++ b/SCOPE_OF_WORK.md @@ -32,113 +32,187 @@ Prefix — has fundamental structural problems: Cleaner code and better inline docs reduce the need for external specification work. -## 2. Proposed Solution +## 2. Solution as Built -Build an automated, in-repo documentation system with four components: +An automated, in-repo documentation system with five components, all living +alongside the code with no external repos and no external vendor dependency: -1. **Initial documentation pass** — Comprehensively document all 1,183 - source files using Claude Code with deep xrpld context -2. **Continuous maintenance** — A GitHub Action on every PR that detects - doc drift and suggests updates, using diff-aware LLM analysis -3. **Coverage enforcement** — CI-enforced documentation coverage thresholds - that ratchet up over time, preventing regression -4. **Developer agents** — Claude Code commands for onboarding, architecture - questions, doc review, and bug pattern detection +1. **Module skills** — Per-module knowledge files in [docs/skills/](docs/skills/) + that capture the "soul" of each subsystem (key files, patterns, pitfalls, + invariants). These are the durable, human-maintained context that the + automated agent and human contributors both consult. +2. **doc-agent (Claude Agent SDK app)** — A TypeScript tool at + [.github/scripts/doc-agent/](.github/scripts/doc-agent/) with three modes: + `document` (write Doxygen comments), `review` (detect drift on a diff), + and `regen-skills` (rebuild a skill file from current code). +3. **Doc-review GitHub Action** — Runs the review mode on every PR; posts + inline comments and a sticky summary. Currently warning-only. +4. **Coverage enforcement** — CI-enforced documentation coverage thresholds + that ratchet up over time, preventing regression. +5. **Developer slash commands** — Claude Code commands in + [.claude/commands/](.claude/commands/) for onboarding, architecture + questions, doc review, and bug pattern detection. -All documentation lives alongside the code. No external repos. No external -dependencies. Documentation accuracy is enforced by CI the same way code -style and test coverage are enforced today. +Documentation accuracy is enforced by CI the same way code style and test +coverage are enforced today. -## 3. Deliverables +## 3. Deliverables — Built -### 3.1 Documentation Standards (docs/DOCUMENTATION_STANDARDS.md) +### 3.1 Documentation Standards -A canonical format guide defining: -- Javadoc-style `/** ... */` Doxygen comments (matches 5,718 existing - instances in the codebase) +[docs/DOCUMENTATION_STANDARDS.md](docs/DOCUMENTATION_STANDARDS.md) — canonical +format guide defining: +- Javadoc-style `/** ... */` Doxygen comments (matches existing convention) - Documentation levels: file, class, public method, free function, enum - Required Doxygen tags: `@param`, `@return`, `@note`, `@invariant` - Quality rules: document behavior and invariants, never paraphrase - signatures, terse style (2-5 lines for classes, 1-3 for functions) + signatures, terse style (2–5 lines for classes, 1–3 for functions) -**Status: Complete.** File created at `docs/DOCUMENTATION_STANDARDS.md`. +### 3.2 Doxygen Configuration Changes -### 3.2 Doxygen Configuration Changes (docs/Doxyfile) +[docs/Doxyfile](docs/Doxyfile): +- `EXTRACT_ALL = NO` (was `YES`) — undocumented entities are flagged rather + than silently extracted +- `GENERATE_XML = YES` (was `NO`) — required for coverxygen to parse and + measure documentation coverage -- `EXTRACT_ALL = NO` (was `YES`) — so undocumented entities are flagged - rather than silently extracted -- `GENERATE_XML = YES` (was `NO`) — required for coverxygen to parse - and measure documentation coverage +### 3.3 Module Skills -**Status: Complete.** Changes applied to `docs/Doxyfile`. +Thirteen module-level skill files in [docs/skills/](docs/skills/), each one +a self-contained guide to a subsystem's responsibilities, key types, control +flow, conventions, and common pitfalls: -### 3.3 Documentation Coverage Pipeline +| Skill | Covers | +|-------|--------| +| [consensus.md](docs/skills/consensus.md) | XRPL consensus algorithm + RCL adapters | +| [cryptography.md](docs/skills/cryptography.md) | CSPRNG, secure erasure, key handling | +| [ledger.md](docs/skills/ledger.md) | ReadView/ApplyView, state tables, sandbox | +| [nodestore.md](docs/skills/nodestore.md) | RocksDB/NuDB/Memory backends | +| [peering.md](docs/skills/peering.md) | Overlay + peerfinder | +| [protocol.md](docs/skills/protocol.md) | STObject, SField, Serializer, TER, Keylets | +| [rpc.md](docs/skills/rpc.md) | RPC handler conventions | +| [shamap.md](docs/skills/shamap.md) | SHA-256 Merkle radix tree | +| [sql.md](docs/skills/sql.md) | SOCI database wrapper, checkpointing | +| [test.md](docs/skills/test.md) | Beast unit test framework conventions | +| [transactors.md](docs/skills/transactors.md) | Full transactor template | +| [websockets.md](docs/skills/websockets.md) | WS subscriptions/streams | +| [index.md](docs/skills/index.md) | Top-level codebase map | -**Components:** +These skills serve a dual purpose: they are reference docs for human +contributors, and they are injected as system-prompt context by the +doc-agent (mapping in [src/config.ts](.github/scripts/doc-agent/src/config.ts)). + +[install-skills.sh](.github/scripts/doc-agent/install-skills.sh) installs +the same files as Claude Code skills under `.claude/skills//SKILL.md`, +so any Claude Code session in the repo picks them up automatically. + +### 3.4 doc-agent (Claude Agent SDK) + +A TypeScript application at [.github/scripts/doc-agent/](.github/scripts/doc-agent/), +built on `@anthropic-ai/claude-agent-sdk`. Three modes: + +| Mode | Purpose | +|------|---------| +| `document` | Add Doxygen comments to a file or directory. Reads sibling `.ai.md` context, the module skill, and the source file; uses `permissionMode: 'acceptEdits'` to write directly. | +| `review` | Given a git range or PR number, detect doc drift. Emits `doc-review-report.md` (sticky comment) and `doc-review-comments.json` (inline comments). | +| `regen-skills` | Rebuild a module's skill file at `docs/skills/.md` from the module's `.ai.md` files plus existing skill content. | + +Layout: + +``` +doc-agent/ +├── package.json # Node >= 20.12, @anthropic-ai/claude-agent-sdk +├── biome.json # lint + format +├── install-skills.sh # copies docs/skills/*.md → .claude/skills/*/SKILL.md +├── prompts/ # System prompts as markdown (editable without code changes) +│ ├── document-file.md +│ ├── review-diff.md +│ └── regen-skill.md +└── src/ + ├── index.ts # CLI entry (document | review | regen-skills) + ├── config.ts # Paths, model, MODULE_SKILL_MAP + ├── prompt-loader.ts # Loads prompts + injects module skill + ├── document.ts + ├── review.ts + ├── regen-skills.ts + └── types.ts +``` + +Notable design decisions: +- **Prompts as markdown, not strings.** Operators tune prompts without + touching TypeScript or redeploying. +- **`.ai.md` sidecar input.** When documenting a file, the agent reads a + sibling `.ai.md` (high-signal prose generated upstream by the + `athenah-ai` pipeline) as the authoritative source of intent. These are + gitignored (`*.ai.md` in [.gitignore](.gitignore)) and discarded once + the initial pass is complete. +- **Model selection via env.** `DOC_AGENT_MODEL` env var; default + `claude-sonnet-4-6`. +- **Repo root override.** `XRPLD_ROOT` env var allows running the agent + against a different checkout (useful in CI and local testing). + +### 3.5 Documentation Coverage Pipeline | File | Purpose | |------|---------| -| `.github/doc-coverage-thresholds.json` | Per-module thresholds + quarterly ratchet schedule | -| `.github/scripts/doc-coverage-check.py` | Parses coverxygen LCOV output, checks thresholds, generates PR report | -| `.github/workflows/doc-coverage.yml` | CI workflow: builds Doxygen XML, runs coverxygen, posts coverage to PR | -| `cmake/XrplDocs.cmake` | New `docs-coverage` CMake target | +| [.github/doc-coverage-thresholds.json](.github/doc-coverage-thresholds.json) | Per-module thresholds + quarterly ratchet schedule | +| [.github/scripts/doc-coverage-check.py](.github/scripts/doc-coverage-check.py) | Parses coverxygen LCOV, checks thresholds, generates PR report | +| [.github/workflows/doc-coverage.yml](.github/workflows/doc-coverage.yml) | CI workflow: builds Doxygen XML, runs coverxygen, posts coverage to PR | +| [cmake/XrplDocs.cmake](cmake/XrplDocs.cmake) | `docs` CMake target wiring | -**How it works:** -1. On every PR touching C++ files, the workflow builds Doxygen XML output - for both the PR branch and the base branch -2. Coverxygen generates LCOV-format coverage reports from the XML -3. The check script compares coverage against per-module thresholds -4. Ratchet mode (`no_decrease`) prevents any PR from reducing doc coverage -5. New files added in a PR require >= 80% doc coverage -6. Results are posted as a sticky PR comment with per-module breakdown +Flow: +1. On every PR touching C++ files, the workflow builds Doxygen XML for + both the PR branch and the base branch (using + `ghcr.io/xrplf/ci/tools-rippled-documentation`). +2. Coverxygen generates LCOV-format coverage from the XML. +3. The check script compares coverage against per-module thresholds. +4. Ratchet mode (`no_decrease`) prevents any PR from reducing coverage. +5. New files added in a PR require ≥ 80% doc coverage. +6. Results are posted as a sticky PR comment with per-module breakdown. -**Status: Complete.** All files created. - -### 3.4 Doc Review GitHub Action - -**Components:** +### 3.6 Doc-Review GitHub Action | File | Purpose | |------|---------| -| `.github/scripts/doc-review.py` | Diff-aware LLM analysis script | -| `.github/workflows/doc-review.yml` | CI workflow: runs on PR, posts review | +| [.github/scripts/doc-review.py](.github/scripts/doc-review.py) | Diff-aware LLM analysis script (Python, uses `anthropic` SDK) | +| [.github/workflows/doc-review.yml](.github/workflows/doc-review.yml) | CI workflow: runs on PR, posts review | -**How it works:** -1. On every PR, determines which C++ files changed -2. For each changed file, extracts the git diff hunks and existing doc - comments -3. Sends both to the Anthropic API with a prompt tuned for xrpld: "Given - this diff, are existing docs still accurate?" -4. Posts results as **inline review comments** on specific lines AND a - **summary comment** on the PR -5. Starts in **warning-only mode** (does not block merge) +Flow: +1. On every PR, determine which C++ files changed. +2. For each changed file, extract git diff hunks and existing doc comments. +3. Send both to the Anthropic API with a prompt tuned for xrpld: + "Given this diff, are existing docs still accurate?" +4. Post results as **inline review comments** on specific lines (via + `actions/github-script`) AND a **sticky summary comment** on the PR. +5. Runs in **warning-only mode** — does not block merge. -**Cost control:** Only processes changed files and changed hunks within -those files. A typical PR touches 3-10 files. Estimated cost: $0.05-0.15 -per PR. +The doc-agent `review` mode is the same logic exposed as a local CLI +(`npm run review develop..HEAD` or `npm run review -- --pr 1234`), useful +for testing prompt changes before they ship to CI. -**Status: Complete.** All files created. +Cost: only changed files and changed hunks within those files are +processed. Estimated ~$0.05–0.15 per PR. -### 3.5 Claude Code Agent Commands +### 3.7 Claude Code Slash Commands -Four developer-facing commands in `.claude/commands/`: +Four developer-facing commands in [.claude/commands/](.claude/commands/): | Command | Purpose | |---------|---------| -| `doc-review` | Review doc accuracy for files changed on current branch | -| `explain-module` | Explain a module's architecture, classes, control flow, and entry points | -| `how-does-x-work` | Trace a feature through the codebase with file/line references | -| `find-bug-patterns` | Scan code for common xrpld bug patterns (unchecked TER, integer overflow, missing amendment gates, etc.) | +| [doc-review](.claude/commands/doc-review.md) | Review doc accuracy for files changed on current branch | +| [explain-module](.claude/commands/explain-module.md) | Explain a module's architecture, classes, control flow, entry points | +| [how-does-x-work](.claude/commands/how-does-x-work.md) | Trace a feature through the codebase with file/line references | +| [find-bug-patterns](.claude/commands/find-bug-patterns.md) | Scan code for common xrpld bug patterns (unchecked TER, integer overflow, missing amendment gates, etc.) | -**Status: Complete.** All files created. - -### 3.6 Full Codebase Documentation +### 3.8 Full Codebase Documentation The initial documentation pass covers 1,183 C++ files organized into 21 -module-level PRs. Each PR is scoped to a single subsystem so one domain -expert can review it. +module-level PRs (see Section 5). The doc-agent `document` mode is the +tool that produces each PR; each file's output is then domain-expert +reviewed before merge. -**Status: Not started.** This is the primary execution phase (see Section 5). +**Status: Pending execution.** Tooling is built; the per-module PRs are +the primary remaining work. ## 4. Resources Required @@ -146,55 +220,57 @@ expert can review it. | Role | Responsibility | Estimated Time | |------|---------------|----------------| -| **Documentation lead** (1 person) | Runs Claude Code for each module, reviews output quality, submits PRs, iterates on prompt quality | 50-60% for 15 weeks | -| **Domain reviewers** (3-5 people, rotating) | Review doc PRs for semantic accuracy in their area of expertise. Each reviewer handles 3-5 PRs. | 2-4 hours per PR | -| **CI/infrastructure** (1 person) | Deploys workflows, monitors costs, tunes false positive rate on doc-review action | 10-15% for 15 weeks | +| **Documentation lead** (1 person) | Runs `doc-agent document` per module, reviews output, submits PRs, iterates on prompts in [prompts/](.github/scripts/doc-agent/prompts/) | 50–60% for 15 weeks | +| **Domain reviewers** (3–5 people, rotating) | Review doc PRs for semantic accuracy in their area of expertise. Each reviewer handles 3–5 PRs. | 2–4 hours per PR | +| **CI/infrastructure** (1 person) | Deploys workflows, monitors costs, tunes false-positive rate on doc-review action | 10–15% for 15 weeks | -**Total estimated effort:** ~1 FTE for 15 weeks + ~80-120 hours of -reviewer time spread across 3-5 engineers. +**Total estimated effort:** ~1 FTE for 15 weeks + ~80–120 hours of +reviewer time spread across 3–5 engineers. ### 4.2 Infrastructure & Tools | Resource | Purpose | Cost | |----------|---------|------| -| **Anthropic API access** | Powers doc-review GitHub Action | ~$50-100/month (20-30 PRs/week, ~2K tokens per file analysis) | -| **Claude Code license** | Initial documentation pass + developer agent commands | Existing license | -| **GitHub Actions minutes** | Doc-coverage workflow (Doxygen XML build + coverxygen) | ~5-10 min per PR on existing `ubuntu-latest` runners | +| **Anthropic API access** | Powers the doc-agent (`document`, `review`, `regen-skills`) and the doc-review GitHub Action | ~$50–150/month steady state | +| **Claude Agent SDK** | `@anthropic-ai/claude-agent-sdk` Node package | Free | +| **Node.js >= 20.12** | Native `--env-file` support; runs the doc-agent | Free | +| **GitHub Actions minutes** | Doc-coverage workflow (Doxygen XML build + coverxygen) and doc-review workflow | ~5–10 min per PR on existing runners | | **Coverxygen** | Python package, open source (MIT) | Free | -| **Doxygen** | Already configured and used — existing `ghcr.io/xrplf/ci/tools-rippled-documentation` container | Free (already in CI) | -| **GitHub Actions secret** | `ANTHROPIC_API_KEY` — needed for doc-review workflow | N/A | - -**Estimated ongoing cost after initial pass:** $50-150/month for API -usage, negligible CI compute on existing runners. +| **Doxygen** | Already configured — uses existing `ghcr.io/xrplf/ci/tools-rippled-documentation` container | Free (already in CI) | +| **GitHub Actions secret** | `ANTHROPIC_API_KEY` — for doc-review workflow | N/A | +| **athenah-ai pipeline output** | Generates `.ai.md` sidecar context files consumed by `doc-agent document` | Upstream — gitignored, removed post-pass | ### 4.3 Access & Permissions - Write access to the `rippled` repository (or a fork for initial PRs) - Ability to add GitHub Actions secrets (`ANTHROPIC_API_KEY`) -- Ability to modify required status checks (when promoting doc-review - from warning to required) +- Ability to modify required status checks (when promoting doc-review from + warning to required) ## 5. Execution Plan -### Phase 0: Infrastructure — Week 1 +### Phase 0: Infrastructure — Complete -Ship the tooling as a single foundational PR: +Tooling shipped as the foundation PR: -- [x] `docs/DOCUMENTATION_STANDARDS.md` -- [x] `docs/Doxyfile` modifications -- [x] `.github/doc-coverage-thresholds.json` -- [x] `.github/scripts/doc-coverage-check.py` -- [x] `.github/workflows/doc-coverage.yml` -- [x] `cmake/XrplDocs.cmake` modifications -- [x] `.github/workflows/doc-review.yml` -- [x] `.github/scripts/doc-review.py` -- [x] `.claude/commands/` (4 agent commands) +- [x] [docs/DOCUMENTATION_STANDARDS.md](docs/DOCUMENTATION_STANDARDS.md) +- [x] [docs/Doxyfile](docs/Doxyfile) modifications +- [x] [docs/skills/](docs/skills/) — 13 module skills + index +- [x] [.github/scripts/doc-agent/](.github/scripts/doc-agent/) — Agent SDK app (document / review / regen-skills) +- [x] [.github/scripts/doc-agent/install-skills.sh](.github/scripts/doc-agent/install-skills.sh) +- [x] [.github/doc-coverage-thresholds.json](.github/doc-coverage-thresholds.json) +- [x] [.github/scripts/doc-coverage-check.py](.github/scripts/doc-coverage-check.py) +- [x] [.github/workflows/doc-coverage.yml](.github/workflows/doc-coverage.yml) +- [x] [cmake/XrplDocs.cmake](cmake/XrplDocs.cmake) +- [x] [.github/workflows/doc-review.yml](.github/workflows/doc-review.yml) +- [x] [.github/scripts/doc-review.py](.github/scripts/doc-review.py) +- [x] [.claude/commands/](.claude/commands/) — 4 developer slash commands -**Exit criteria:** All workflows pass on a test PR. Coverage report +**Exit criteria met:** All workflows pass on a test PR. Coverage report renders correctly. Doc-review action posts comments without false positives on a sample PR. -### Phase 1: Foundation Modules — Weeks 2-4 +### Phase 1: Foundation Modules — Weeks 2–4 Document the lowest-level modules first (everything else depends on these): @@ -206,18 +282,18 @@ Document the lowest-level modules first (everything else depends on these): | 4 | `include/xrpl/beast/` + `src/libxrpl/beast/` | 88 | ~20K | **Process per PR:** -1. Create branch `docs/module-` from `develop` -2. Run Claude Code against each file with full context: the file itself, - its includes, corresponding test files, and the module README -3. Generate `/** */` doc comments following DOCUMENTATION_STANDARDS.md -4. Domain expert reviews for semantic accuracy -5. Run Doxygen build to validate no doc errors -6. Merge, ratchet that module's threshold up to actual coverage level +1. Create branch `docs/module-` from `develop`. +2. Run `npm run document ` from `.github/scripts/doc-agent/`. The + agent reads each file's `.ai.md` sidecar, the matching module skill, + and the file itself, then writes Doxygen comments per the standards. +3. Domain expert reviews for semantic accuracy. +4. Run Doxygen build to validate no doc errors. +5. Merge; ratchet that module's threshold up to actual coverage level. **Exit criteria:** 4 PRs merged. Coverage for these modules at 60%+. Doc-review action running in warning mode on all subsequent PRs. -### Phase 2: Protocol & Transaction Engine — Weeks 4-8 +### Phase 2: Protocol & Transaction Engine — Weeks 4–8 | PR | Module | ~Files | |----|--------|--------| @@ -232,9 +308,9 @@ Doc-review action running in warning mode on all subsequent PRs. | 13 | Pathfinding + invariants | 30 | **Exit criteria:** 9 PRs merged. Global coverage at 40%+. Doc-review -false positive rate tracked and < 10%. +false-positive rate tracked and < 10%. -### Phase 3: Server & Application Layer — Weeks 8-13 +### Phase 3: Server & Application Layer — Weeks 8–13 | PR | Module | ~Files | |----|--------|--------| @@ -250,16 +326,17 @@ false positive rate tracked and < 10%. **Exit criteria:** 8 PRs merged. Global coverage at 60%+. Doc-review action promoted from warning to **required check**. -### Phase 4: Tests & Polish — Weeks 13-15 +### Phase 4: Tests & Polish — Weeks 13–15 - Document test files (brief docs only — test name + what it validates) - Global threshold at 70% - Full coverage trend reporting on GitHub Pages -- Retrospective: review false positive rate, API costs, contributor +- Remove `.ai.md` sidecar files (they were transitional input only) +- Retrospective: review false-positive rate, API costs, contributor feedback **Exit criteria:** 70% global doc coverage. Doc-review required check -with < 5% false positive rate. Coverage trend visible on GitHub Pages. +with < 5% false-positive rate. Coverage trend visible on GitHub Pages. ## 6. Threshold Ratchet Schedule @@ -281,13 +358,14 @@ New files always require 80% coverage regardless of the global threshold. | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|------------| -| LLM generates plausible but wrong docs | Medium | High | Every doc PR requires human domain expert review. Model output is a draft, not final product. | -| Doc-review action false positives annoy contributors | Medium | Medium | Warning-only mode for 3 months. Promote to required only when FP rate < 5%. | +| LLM generates plausible but wrong docs | Medium | High | Every doc PR requires human domain expert review. `.ai.md` sidecars (athenah-ai) ground the agent in source-derived intent rather than free generation. | +| Doc-review action false positives annoy contributors | Medium | Medium | Warning-only mode for 3 months. Promote to required only when FP rate < 5%. Prompts live in markdown ([prompts/](.github/scripts/doc-agent/prompts/)) and can be tuned without a code release. | | Coverage enforcement blocks unrelated PRs | Low | Medium | Start at 0% threshold with `no_decrease` only. Quarterly increases announced in advance. | -| Reviewer bandwidth bottleneck | Medium | Medium | PRs scoped to single modules. Reviewers rotate. 2-4 hours per PR is manageable. | -| API costs exceed budget | Low | Low | Only processes diff hunks, not full files. ~$0.05-0.15/PR. Monthly budget cap of $200 with alerting. | +| Reviewer bandwidth bottleneck | Medium | Medium | PRs scoped to single modules. Reviewers rotate. 2–4 hours per PR is manageable. | +| API costs exceed budget | Low | Low | Only diff hunks processed. ~$0.05–0.15/PR. Monthly budget cap of $200 with alerting. | | Doxygen XML build adds CI time | Low | Low | Runs in parallel with existing checks. Uses existing documentation container. ~5 min. | -| Doc comments add code noise | Low | Low | Terse style enforced by standards. 2-5 lines per class, 1-3 per function. | +| Doc comments add code noise | Low | Low | Terse style enforced by standards. 2–5 lines per class, 1–3 per function. | +| Skill files drift from code | Medium | Medium | `doc-agent regen-skills ` rebuilds a skill from current `.ai.md` files; intended to be run periodically. | | Initial pass takes longer than 15 weeks | Medium | Low | Modules are independent. Can parallelize with multiple contributors. Lower-priority modules can slip. | ## 8. Success Metrics @@ -311,11 +389,12 @@ the "source of truth" for how xrpld behaves: | Need | Before | After | |------|--------|-------| -| "What does this function do?" | Read the code, guess | Read the inline doc | -| "How does the payment engine work?" | Read Common Prefix spec (maybe stale) | Run `/explain-module` or `/how-does-x-work` | +| "What does this function do?" | Read the code, guess | Read the inline Doxygen doc | +| "How does the payment engine work?" | Read Common Prefix spec (maybe stale) | Read [docs/skills/transactors.md](docs/skills/transactors.md) or run `/explain-module` | | "Did this PR break any documented behavior?" | Manual review, hope someone notices | Doc-review action flags it automatically | | "What's our documentation coverage?" | Unknown | Measured per-module in every PR | | "Is the spec up to date?" | Check manually, probably not | Docs are in-repo, enforced by CI | +| "Where do I start in module X?" | Ask in chat | Read the module skill in [docs/skills/](docs/skills/) | ## 10. Out of Scope @@ -327,19 +406,19 @@ the "source of truth" for how xrpld behaves: optional. Test coverage measurement is handled by existing Codecov integration. - **Architectural decision records.** Module-level READMEs already exist - for key subsystems. This project adds function/class-level docs, not - system-level design documents. + for key subsystems. This project adds function/class-level docs and the + module skills layer, not system-level ADRs. ## 11. Timeline Summary ``` -Week 1 Phase 0: Infrastructure PR (tooling, workflows, standards) +Week 1 Phase 0: Infrastructure (tooling, workflows, standards, skills) — COMPLETE Weeks 2-4 Phase 1: Foundation modules (basics, crypto, json, beast) Weeks 4-8 Phase 2: Protocol & TX engine (protocol, ledger, tx, paths) Weeks 8-13 Phase 3: Server & application (overlay, consensus, rpc, app) -Weeks 13-15 Phase 4: Tests & polish, promote to required check +Weeks 13-15 Phase 4: Tests & polish, promote to required check, drop .ai.md sidecars ``` **Total duration:** 15 weeks -**Total effort:** ~1 FTE + 80-120 hours reviewer time -**Ongoing cost:** ~$50-150/month API + negligible CI compute +**Total effort:** ~1 FTE + 80–120 hours reviewer time +**Ongoing cost:** ~$50–150/month API + negligible CI compute diff --git a/docs/skills/consensus.md b/docs/skills/consensus.md index 29b3c8864c..5abe1ec0c9 100644 --- a/docs/skills/consensus.md +++ b/docs/skills/consensus.md @@ -1,6 +1,6 @@ # Consensus -Template-based state machine in `Consensus.h` parameterized by an `Adaptor` (production: `RCLConsensus`). Three phases: `open -> establish -> accepted`. Four modes: `proposing`, `observing`, `wrongLedger`, `switchedLedger`. Header-only because of templating; policy decisions (`shouldCloseLedger`, `checkConsensus`) live as free functions in `Consensus.cpp` for independent testability. +Template-based state machine in `Consensus.h` parameterized by an `Adaptor` (production: `RCLConsensus`). Three phases: `open → establish → accepted`. Four modes: `proposing`, `observing`, `wrongLedger`, `switchedLedger`. Header-only because of templating; policy decisions (`shouldCloseLedger`, `checkConsensus`, `checkConsensusReached`) live as free functions in `Consensus.cpp` for independent testability. ## Architecture @@ -13,11 +13,12 @@ The engine itself has no thread or timer — it is driven externally by `timerEn - A ledger cannot close until the previous ledger reaches consensus AND (has transactions OR close time reached) - Proposals must have strictly increasing sequence numbers per peer; stale proposals are silently dropped - `ConsensusResult` constructor asserts `txns.id() == position.position()` — a node's declared position is always a commitment to a specific tx set -- The Avalanche state machine progressively raises consensus thresholds over time (`init -> mid -> late -> stuck`) to force convergence +- The Avalanche state machine progressively raises consensus thresholds over time (`init → mid → late → stuck`) to force convergence - `minCONSENSUS_PCT = 80` is the baseline for `checkConsensus`; timing: `ledgerMIN_CONSENSUS = 1950ms`, `ledgerMAX_CONSENSUS = 15s`, `ledgerABANDON_CONSENSUS = 120s` - `ledgerMAX_CONSENSUS` must stay below `validationFRESHNESS` so waiting validators aren't mistaken for offline - Dead nodes (`deadNodes_`) are permanently excluded for the round once they bow out - LedgerTrie compression invariant: non-root nodes with zero `tipSupport` must have ≥2 children +- `ConsensusResult::disputes` holds only genuinely-differing transactions; `compares` set prevents O(n²) work when multiple peers share a tx set ## Phases and Modes @@ -63,17 +64,17 @@ Per tick: `updateOurPositions()` → `shouldPause()` → `haveConsensus()`. `led `shouldPause()` uses a 5-phase cycle (0–4) keyed off `(ahead - 1) % 5`. Each phase requires progressively more validators current; phase 4 requires all. This cycles to avoid any single threshold being universally right. -### checkConsensus outcomes (`ConsensusState`) +### checkConsensus outcomes (`ConsensusState` in `ConsensusTypes.h`) - `No` — insufficient agreement -- `Yes` — local + network agree on tx set (80% with self counted) +- `Yes` — local + network agree on tx set (80% with self counted, via `proposing` flag in `checkConsensusReached`) - `MovedOn` — 80% of peers finished without us (self not counted); we lost the race - `Expired` — abandoned after `prevAgreeTime * ledgerABANDON_CONSENSUS_FACTOR` (factor=10), clamped to `[ledgerMAX_CONSENSUS, ledgerABANDON_CONSENSUS]` -The zero-peer case in `checkConsensusReached` deliberately refuses consensus until `reachedMax` — prevents premature self-close on a network slow to deliver proposals. The `stalled` case bypasses the percentage check entirely. +The zero-peer case in `checkConsensusReached` deliberately refuses consensus until `reachedMax` — prevents premature self-close on a network slow to deliver proposals. The `stalled` case bypasses the percentage check entirely; when all disputed transactions have clear supermajority agreement either way, network commits immediately. ## Avalanche Voting -Four states defined in `ConsensusParms.h`: +Four states defined in `ConsensusParms.h` as `std::map` (data-driven, not switch — supports hypothetical loops): | State | Time threshold (% of prior round) | Required yes-vote | Next | |---------|-----------------------------------|-------------------|--------| @@ -82,19 +83,21 @@ Four states defined in `ConsensusParms.h`: | `late` | 85% | 70% | `stuck`| | `stuck` | 200% | 95% | `stuck`| -Encoded as `std::map` (data-driven, not switch) so theoretical loops are expressible. `getNeededWeight()` returns `(consensusPct, optional)`. Caller does the actual state update. `avMIN_ROUNDS` prevents premature escalation through states on clock jitter. +`getNeededWeight()` returns `(consensusPct, optional)`; caller does the actual state update. `avMIN_ROUNDS` prevents premature escalation on clock jitter; `avalancheCounter_` resets to zero on every state transition. `DisputedTx::updateVote()` behaves asymmetrically: - Proposing: `weight = (yays_*100 + (ourVote_?100:0)) / (nays_+yays_+1)`; `newPosition = weight > requiredPct` - Not proposing: `newPosition = yays_ > nays_`, `weight = -1`. Observer never distorts proposers' weighted vote. +`DisputedTx` uses `boost::container::flat_map` for peer votes (cache-friendly for small sets), pre-reserved to `numPeers`. `yays_` and `nays_` counters allow O(1) percentage computation without scanning the map. `setVote()` returns `true` on any change (including a new vote), which feeds `peerUnchangedCounter_` tracking. + Stall detection (`DisputedTx::stalled`) — all must hold: -1. `nextCutoff.consensusTime <= currentCutoff.consensusTime` (terminal state) +1. `nextCutoff.consensusTime <= currentCutoff.consensusTime` (terminal `stuck` state) 2. ≥ `avMIN_ROUNDS` rounds in state -3. `peersUnchanged >= avSTALLED_ROUNDS` OR `currentVoteCounter_ >= avSTALLED_ROUNDS` (OR not AND — defends against a peer flip-flopping to reset the counter) +3. `peersUnchanged >= avSTALLED_ROUNDS` **OR** `currentVoteCounter_ >= avSTALLED_ROUNDS` (OR not AND — defends against a peer flip-flopping to reset the counter) 4. Vote split exceeds `minCONSENSUS_PCT` (80%) in either direction -`peerUnchangedCounter_` resets to 0 on any peer vote change in `updateDisputes()`. Close-time consensus uses a separate threshold `avCT_CONSENSUS_PCT` (75%) — close-time agreement is a simple majority, not a multi-round ratchet. +`peerUnchangedCounter_` resets to 0 on any peer vote change in `updateDisputes()`. Close-time consensus uses a separate threshold `avCT_CONSENSUS_PCT` (75%) — close-time agreement is a simpler majority, not a multi-round ratchet. ## Proposals (`ConsensusProposal.h`) @@ -106,6 +109,14 @@ Sequence sentinels: `seenTime()` is local wall-clock time when last updated, NOT `closeTime_` (the proposer's estimate of when the ledger should close in `NetClock`). Don't conflate them. `isStale(cutoff)` uses `seenTime()`. `operator==` includes `seenTime()`, so logically-identical proposals seen at different times don't compare equal. +The production wrapper `RCLCxPeerPos` (in `app/consensus/`) adds cryptographic signature and public key for network propagation. Template parameters `(NodeID_t, LedgerID_t, Position_t)` allow unit-test instantiation over simple integer types. + +## `ConsensusTypes.h` — Vocabulary Types + +- **`ConsensusTimer`**: dual `tick()` overloads — wall-clock (`steady_clock::time_point`) and fixed-increment (for deterministic simulation). Both update `dur_`; `read()` always valid. Backing `roundTime` in `ConsensusResult` feeds `prevRoundTime_`. +- **`ConsensusCloseTimes`**: `peers` is `std::map` (ordered for deterministic traversal when resolving close time); `self` is local estimate. Collects initial (`seqJoin`) proposals for clock-drift measurement. +- **`ConsensusResult`**: instantiated once per round by `closeLedger`, lives in `Consensus::result_` as `std::optional`. Holds `disputes`, `compares` work-avoidance set, `proposers` snapshot. `state` field records `ConsensusState` outcome for diagnostics. + ## Wrong-Ledger Recovery At every `timerEntry()`, `checkLedger()` calls `adaptor_.getPrevLedger()`. If diverged, `handleWrongLedger()`: @@ -114,7 +125,7 @@ At every `timerEntry()`, `checkLedger()` calls `adaptor_.getPrevLedger()`. If di 3. Calls `playbackProposals()` — replays proposals from `recentPeerPositions_` (capped at 10/peer, stored regardless of ledger ID) 4. If correct ledger acquired: `startRoundInternal()` in `switchedLedger` mode; else: stays in `wrongLedger` -The bounded `recentPeerPositions_` buffer is a deliberate trade-off: small bounded buffer beats dropping proposals during switches. +The bounded `recentPeerPositions_` buffer is a deliberate trade-off: small bounded buffer beats dropping proposals during switches. Recovery re-enters `open` phase mid-`establish` via `handleWrongLedger()`, preserving surrounding state. ## Common Bug Patterns @@ -125,6 +136,7 @@ The bounded `recentPeerPositions_` buffer is a deliberate trade-off: small bound - The `peerUnchangedCounter_` is reset to 0 when any vote changes; bugs in this counter cause premature consensus declaration - Forgetting `signingHash_.reset()` before mutating a `ConsensusProposal` returns stale hashes - Comparing wall-clock `seenTime()` against `NetClock` `closeTime_` is a type-shaped bug waiting to happen +- Two temporal domains in `ConsensusParms`: validation/proposal parms use **NetClock seconds**; consensus-loop timers use **steady-clock milliseconds** — mixing them produces subtle bugs ## Key Code Patterns @@ -174,7 +186,7 @@ Most methods take `std::unique_ptr const& clog = {}`. `CLOG(c - Same seq + ledger, different cookie → `ValStatus::multiple` (misconfig/duplicate) - Otherwise → `ValStatus::badSeq` -All trie queries go through `withTrie()`, which first `current()`-flushes stale entries then `checkAcquired()`-promotes newly available ledgers. `lastLedger_` tracks each node's trie contribution so `removeTrie()` can atomically undo before re-inserting. +All trie queries go through `withTrie()`, which first flushes stale entries via `current()` then promotes newly-available ledgers via `checkAcquired()`. `lastLedger_` tracks each node's trie contribution so `removeTrie()` can atomically undo before re-inserting. `getPreferred(curr)` fallback: trie → `acquiring_` (max waiters) → `nullopt`. Conservative switch rule: if preferred is an immediate child of current working ledger, stay put. @@ -205,6 +217,8 @@ Counters propagate up the parent chain on every `insert`/`remove`. Non-root node `Ledger` template contract: cheap copy, `seq()`, `operator[](Seq)` returning `ID{0}` for unknowns, `MakeGenesis{}` tag, free `mismatch(Ledger,Ledger)`. Unique history invariant: agreement on any ancestor ID implies agreement on all earlier ancestors. +`SpanTip` is the return type of `getPreferred()` — a lightweight struct with the tip's seq, ID, and a ledger copy for ancestor lookups. `Span::diff()` delegates to `mismatch()` to find first divergence point. + ## Amendments - 80% validator support for 2 weeks to enable; tracked via `AmendmentTable` with `amendmentMap_` diff --git a/docs/skills/cryptography.md b/docs/skills/cryptography.md index f7bd36ed2e..6aaa137659 100644 --- a/docs/skills/cryptography.md +++ b/docs/skills/cryptography.md @@ -2,33 +2,43 @@ XRPL supports secp256k1 (ECDSA) and ed25519 key types. All crypto uses OpenSSL + dedicated libs (libsecp256k1, ed25519-donna). The `xrpl::crypto` layer provides three foundational utilities — a CSPRNG, secure memory erasure, and RFC 1751 mnemonic encoding — that underpin all key/seed handling. +## Module Layout + +Three small, focused TUs form the foundation; protocol-level types (`SecretKey`, `PublicKey`, `Seed`) in `src/libxrpl/protocol/` consume them. + +| File | Purpose | +|------|---------| +| `include/xrpl/crypto/csprng.h` / `src/libxrpl/crypto/csprng.cpp` | `csprng_engine` + `crypto_prng()` singleton; wraps OpenSSL `RAND_bytes`/`RAND_add`/`RAND_poll` | +| `include/xrpl/crypto/secure_erase.h` / `src/libxrpl/crypto/secure_erase.cpp` | One-line delegation to `OPENSSL_cleanse`; canonical wipe primitive | +| `include/xrpl/crypto/RFC1751.h` / `src/libxrpl/crypto/RFC1751.cpp` | Static class; 2048-word mnemonic codec + `getWordFromBlob` utility | + ## Key Invariants -- `SecretKey` and `Seed` destructors call `secure_erase` on their internal buffer; any code handling secret keys/seeds must follow this pattern +- `SecretKey` and `Seed` destructors call `secure_erase` on their internal buffer as the very first action; any new sensitive type must follow this pattern (covers exception unwind paths too) - ed25519 public keys are prefixed with `0xED` (33 bytes total); secp256k1 keys are 33-byte compressed - `sha512Half` (first 32 bytes of SHA-512) is the standard hash used throughout XRPL for node hashing, signing, etc. - `RIPEMD-160(SHA-256(x))` is used for account ID derivation (`ripesha_hasher`) - Base58 encoding includes a type byte prefix and 4-byte checksum (double SHA-256) - All randomness for cryptographic material flows through `crypto_prng()`; never call OpenSSL's `RAND_bytes` directly and never use `std::rand`/`rand()` -- `csprng_engine` is non-copyable and non-movable by deleted ops; the singleton must be accessed by reference via `crypto_prng()` +- `csprng_engine` is non-copyable and non-movable (deleted ops); the singleton must be accessed by reference via `crypto_prng()` - `csprng_engine` satisfies the C++ *UniformRandomNumberEngine* named requirement (`result_type` = `std::uint64_t`, `operator()()`, `constexpr min()`/`max()`) — plugs into `std::uniform_int_distribution`, `beast::rngfill`, etc. -- RFC 1751 dictionary has exactly 2^11 = 2048 entries; entries 0–570 are 1–3 char words, 571–2047 are exactly 4 chars (used to split binary search range in `wsrch`) -- Each RFC 1751 word encodes exactly 11 bits; a 64-bit block uses 6 words (66 bits = 64 data + 2 parity); a 128-bit key uses two such blocks → 12 words +- RFC 1751 dictionary has exactly 2^11 = 2048 entries; indices 0–570 are words ≤ 3 chars, 571–2047 are exactly 4 chars (exploited in `wsrch` to halve binary search range) +- Each RFC 1751 word encodes exactly 11 bits; a 64-bit block uses 6 words (66 bits = 64 data + 2 parity); a 128-bit key uses two such blocks → 12 words total ## Common Bug Patterns -- Mixing up key types: secp256k1 signing hashes the message with sha512Half first, ed25519 signs the raw message +- Mixing up key types: secp256k1 signing hashes the message with `sha512Half` first; ed25519 signs the raw message - `signDigest` only works with secp256k1; calling it with ed25519 throws a logic error -- Signature canonicality: ed25519 `verify` checks signature canonicality before calling `ed25519_sign_open`; non-canonical signatures are rejected +- Signature canonicality: ed25519 `verify` checks canonicality before calling `ed25519_sign_open`; non-canonical signatures are rejected - Overlay handshake uses `signDigest` to sign the session fingerprint (`sharedValue`); the signature binds the TLS session to the node identity -- Relying on a naive `memset` to wipe key material — optimizer will eliminate it as a dead store. Must use `secure_erase` +- Relying on a naive `memset` to wipe key material — optimizer will eliminate it as a dead store; must use `secure_erase` - Forgetting to wipe *intermediate* derivation buffers (SHA-512 halves, scratch arrays) after the final `SecretKey` has taken its copy - Constructing a second `csprng_engine` instance: forbidden by deleted ctors; sharing one OpenSSL pool through the singleton is required -- Passing `mix_entropy` a buffer and assuming OpenSSL credits it as entropy — the entropy estimate is always 0 (deliberately conservative) -- RFC 1751 decode: distinguish `0` (unknown word), `-1` (malformed input), `-2` (parity failure) — don't collapse all failures into a single error -- Forgetting that `insert()` in RFC1751 uses bitwise OR, not assignment — output buffer must start zero-initialized +- Passing `mix_entropy` a buffer and assuming OpenSSL credits it as entropy — the entropy estimate passed to `RAND_add` is always `0` (deliberately conservative; `std::random_device` may be weak on some platforms) +- RFC 1751 decode: distinguish `1` (success), `0` (unknown word), `-1` (malformed input), `-2` (parity failure) — do not collapse all failures into a single error +- `insert()` in RFC 1751 uses bitwise OR, not assignment — output buffer must start zero-initialized - Treating RFC 1751 parity as cryptographic integrity — it's a 2-bit transcription check, not a MAC -- Using `getWordFromBlob` for anything cryptographic — it's a Jenkins hash and explicitly insecure +- Using `getWordFromBlob` for anything cryptographic — it's a Jenkins one-at-a-time hash and explicitly insecure ## Review Checklist @@ -38,6 +48,7 @@ XRPL supports secp256k1 (ECDSA) and ed25519 key types. All crypto uses OpenSSL + - Any new sensitive type should follow the `SecretKey`/`Seed` pattern: destructor calls `secure_erase` as its first/only action - New OpenSSL touchpoints should respect the `OPENSSL_VERSION_NUMBER < 0x10100000L` thread-safety guard pattern used in `csprng.cpp` - CSPRNG failures (`RAND_bytes`/`RAND_poll` ≠ 1) must propagate via `Throw<>` (logs stack trace) — never silently fall back +- `RAND_cleanup()` must only be called for OpenSSL `< 1.1.0`; modern versions handle cleanup via `atexit` ## Key Patterns @@ -55,7 +66,9 @@ SecretKey sk(Slice{buf, sizeof(buf)}); secure_erase(buf, sizeof(buf)); // MUST erase raw buffer ``` -`secure_erase` delegates to `OPENSSL_cleanse`, which uses volatile writes / opaque function-pointer calls to defeat dead-store elimination. Lives in a separate TU (`secure_erase.cpp`) so the call site cannot inline it away — the out-of-line call alone forces the compiler to treat it as an opaque side effect. It does **not** clear CPU registers or caches — it is best-effort for heap/stack only (see Percival 2014). Takes raw `void*` + byte count with no null/zero guards; callers must supply valid arguments. +`secure_erase` delegates to `OPENSSL_cleanse`, which uses volatile writes / opaque function-pointer calls to defeat dead-store elimination. Lives in a separate TU (`secure_erase.cpp`) so the call site cannot inline it away — the out-of-line call forces the compiler to treat it as an opaque side effect. It does **not** clear CPU registers or caches — best-effort for heap/stack only (see Percival 2014). Takes raw `void*` + byte count with no null/zero guards; callers must supply valid arguments. + +Wrapping behind `xrpl::secure_erase` provides one auditable choke point if the underlying strategy ever changes (e.g., switching to `explicit_bzero`). `OPENSSL_cleanse` is preferred over platform-specific alternatives (`memset_s`, `explicit_bzero`, `SecureZeroMemory`) because OpenSSL already centralizes cross-platform portability for the rest of the crypto stack. ### CSPRNG Usage ```cpp @@ -70,7 +83,7 @@ rng(buf, sizeof(buf)); // operator()(void*, size_t) beast::rngfill(buf, sizeof(buf), crypto_prng()); ``` -Failure (insufficient entropy) throws `std::runtime_error("CSPRNG: Insufficient entropy")` via `Throw<>`; callers generally do not catch — propagation halts the operation, which is correct. Generating cryptographic material from an entropy-exhausted pool would be worse than crashing. +Constructor calls `RAND_poll()` eagerly to surface entropy failures at startup rather than at first key gen. Failure throws `std::runtime_error("CSPRNG: Insufficient entropy")` via `Throw<>`; callers generally do not catch — propagation halts the operation, which is correct. ### Key Type Dispatch ```cpp @@ -91,47 +104,34 @@ RFC1751::getEnglishFromKey(words, std::string{seedBytes, 16}); std::string roundTrip; int rc = RFC1751::getKeyFromEnglish(roundTrip, words); -// rc == 1 success; 0 unknown word; -1 malformed; -2 parity mismatch +// rc: 1=success, 0=unknown word, -1=malformed, -2=parity mismatch ``` `Seed.cpp` reverses the 16 bytes before/after RFC 1751 encoding to match the RFC's big-endian convention. `standard()` normalizes input by uppercasing and applying visual substitutions `1→L`, `0→O`, `5→S` for handwritten/OCR tolerance. The 2-bit parity per 8-byte half is a transcription check, **not** a cryptographic integrity check. -`getKeyFromEnglish` uses `boost::algorithm::split` with `token_compress_on` for whitespace tolerance. The asymmetry between encoder (no return code — cannot fail on valid input) and decoder (4-valued return code) is intentional: encoding is lossless, decoding must validate user input. +`getKeyFromEnglish` uses `boost::algorithm::split` with `token_compress_on` for whitespace tolerance. Encoder (`getEnglishFromKey`) has no return code — encoding is lossless and cannot fail on valid 16-byte input. Decoder (`getKeyFromEnglish`) has a 4-valued return code — it must validate user-supplied strings. -`getWordFromBlob` is a separate utility: Jenkins one-at-a-time hash → `% 2048` → one dictionary word. Explicitly **not** cryptographically secure; used in `NetworkOPs.cpp` for `shroudedHostId` (privacy-preserving node label in logs/RPC). +`getWordFromBlob` is a separate utility: Jenkins one-at-a-time hash → `% 2048` → one dictionary word. Explicitly **not** cryptographically secure; used in `NetworkOPs.cpp` for `shroudedHostId` (privacy-preserving node label in logs/RPC). Reuses the RFC 1751 dictionary purely for its vetted set of short, pronounceable words. -## Module Layout +## CSPRNG Internals -The `xrpl::crypto` foundation has three small, focused TUs: - -| File | Purpose | -|------|---------| -| `csprng.cpp/.h` | `csprng_engine` + `crypto_prng()` singleton; wraps OpenSSL `RAND_bytes`/`RAND_add`/`RAND_poll` | -| `secure_erase.cpp/.h` | One-line delegation to `OPENSSL_cleanse`; the canonical wipe primitive | -| `RFC1751.cpp/.h` | Static class; 2048-word mnemonic codec + `getWordFromBlob` utility | - -These three are used together by the protocol-level key/seed types (`SecretKey`, `PublicKey`, `Seed`) which live in `src/libxrpl/protocol/`. - -### CSPRNG Internals Worth Knowing - -- Constructor calls `RAND_poll()` eagerly to surface OS entropy failures at startup, not at first key gen -- Destructor calls `RAND_cleanup()` only for OpenSSL `< 1.1.0` (modern versions clean up via `atexit`) -- Thread-safety mutex is compile-time gated: `#if (OPENSSL_VERSION_NUMBER < 0x10100000L) || !defined(OPENSSL_THREADS)` — modern builds elide the lock on the hot path because `RAND_bytes` is internally thread-safe. The mutex is *always* held around `RAND_add` in `mix_entropy` regardless of version. +- Constructor calls `RAND_poll()` eagerly; destructor calls `RAND_cleanup()` only for OpenSSL `< 1.1.0` (modern versions clean up via `atexit`) +- Thread-safety mutex is compile-time gated: `#if (OPENSSL_VERSION_NUMBER < 0x10100000L) || !defined(OPENSSL_THREADS)` — modern builds elide the lock on the hot path (`RAND_bytes` is internally thread-safe in OpenSSL ≥ 1.1.0). The mutex is *always* held around `RAND_add` in `mix_entropy` regardless of version - `mix_entropy` reads 128 values from `std::random_device` *before* locking (independently thread-safe), then locks for `RAND_add` -- `mix_entropy` passes entropy estimate `0` to `RAND_add` — never claim entropy for `std::random_device` or caller-supplied buffers (they may be weak on some platforms; conservative accounting prevents prematurely satisfying OpenSSL's seeding threshold) -- Called on a timer from `Application.cpp` to stir fresh OS entropy during the node's lifetime +- `mix_entropy` passes entropy estimate `0` to `RAND_add` — never claim entropy for `std::random_device` or caller-supplied buffers (conservative accounting prevents prematurely satisfying OpenSSL's seeding threshold) +- `mix_entropy` is called on a timer from `Application.cpp` to stir fresh OS entropy during the node's lifetime - Singleton is a function-local `static` (Meyers singleton); C++11 guarantees thread-safe one-time init - Scalar `operator()()` delegates to buffer-fill overload with `sizeof(result_type)` (8 bytes) — both paths share validation/error handling -- Wrapping behind a single `xrpl::secure_erase` function lets the strategy change (e.g., to `explicit_bzero`) at one auditable choke point without touching call sites -### RFC 1751 Internals Worth Knowing +## RFC 1751 Internals -- `extract(s, start, length)` / `insert(s, x, start, length)`: read/write `length ≤ 11` bits at arbitrary offset across a 9-byte buffer; guarded by `XRPL_ASSERT` (stripped in release). Both work across byte boundaries by assembling 2–3 adjacent bytes. -- `insert` uses bitwise OR (not assignment), so the output buffer must start zero-initialized; partial writes accumulate safely -- `btoe` adds a 9th byte for the 2-bit parity computed by summing all 32 pairs of bits across the 64-bit payload; parity occupies bit positions 64–65 +- `extract(s, start, length)` / `insert(s, x, start, length)`: read/write `length ≤ 11` bits at arbitrary offset across a 9-byte buffer; guarded by `XRPL_ASSERT` (stripped in release). Both work across byte boundaries by assembling 2–3 adjacent bytes into a 24-bit window +- `insert` uses bitwise OR (not assignment) — output buffer must start zero-initialized; partial writes accumulate safely +- `btoe` appends a 9th byte for 2-bit parity computed by summing all 32 bit-pairs across the 64-bit payload; parity occupies bit positions 64–65 - `etob` validates: exactly 6 words, each 1–4 chars, all in dictionary, parity matches — distinct error codes per failure mode (`0` unknown, `-1` malformed, `-2` parity) -- `wsrch` halves the binary search range based on input word length: `[0, 571)` for 1–3 char words, `[571, 2048)` for 4-char words -- No exceptions used anywhere in RFC1751 — all errors are integer return codes +- `wsrch` halves the binary search range based on input word length: `[0, 571)` for ≤3-char words, `[571, 2048)` for 4-char words +- No exceptions used anywhere in RFC 1751 — all errors are integer return codes +- All methods are static; `RFC1751` is a pure stateless utility class — instantiation is never needed ## Key Files diff --git a/docs/skills/ledger.md b/docs/skills/ledger.md index 4ec4d7552d..fed6f932d6 100644 --- a/docs/skills/ledger.md +++ b/docs/skills/ledger.md @@ -1,39 +1,51 @@ # Ledger -Each ledger is an immutable snapshot: header (seq, hashes, close time) + state SHAMap + transaction SHAMap. `LedgerMaster` is the central coordinator. The module spans `Ledger` itself, the view hierarchy (`ReadView` → `ApplyView` → `OpenView`/`Sandbox`/`PaymentSandbox`), directory primitives, and a large family of per-object-type helper free functions. +Each ledger is an immutable snapshot: header (seq, hashes, close time) + state SHAMap + transaction SHAMap. `LedgerMaster` is the central coordinator. The module spans `Ledger` itself, the view hierarchy (`ReadView` → `ApplyView` → `OpenView`/`Sandbox`/`PaymentSandbox`), directory primitives, subscription/order-book fan-out (`BookListeners`, `OrderBookDB`), governance state (`AmendmentTable`), pending-save bookkeeping, and a large family of per-object-type helper free functions. ## Key Invariants -- Once `setImmutable()` is called, the ledger and its SHAMaps cannot change; only immutable ledgers can be set in `LedgerHolder`. Mutable ledgers must not be shared; immutable ones can be shared lock-free. -- Every server always has an open ledger; the open ledger cannot close until previous consensus completes -- Ledger header hashes to the ledger's identity hash; includes state root, tx root, parent hash, total coins, close time -- `LedgerMaster` tracks: `mPubLedger` (last published), `mValidLedger` (last validated), `mLedgerHistory` (cache) -- Validation requires minimum trusted validations (`minVal`); filtered by Negative UNL -- Open ledgers store transactions without metadata; closed ledgers store `addVL(tx)||addVL(meta)` and produce `TxMeta` on apply -- Trust-line `sfBalance` is always stored "low account's perspective"; helpers negate when querying from high side -- Directory invariant: page keys are chosen so the low 96 bits of every token in a page are strictly less than the page key's low 96 bits (NFT pages); for owner/order-book directories, page 0 is the anchor and `sfIndexPrevious` on root points to the tail +- Once `setImmutable()` is called, the ledger and its SHAMaps cannot change; only immutable ledgers can be shared across threads. Mutable ledgers must not be shared. +- Every server always has an open ledger; the open ledger cannot close until previous consensus completes. +- Ledger header hashes to the ledger's identity hash; includes state root, tx root, parent hash, total coins, close time. +- `LedgerMaster` tracks: `mPubLedger` (last published), `mValidLedger` (last validated), `mLedgerHistory` (cache). +- Validation requires minimum trusted validations (`minVal`); filtered by Negative UNL. +- Open ledgers store transactions without metadata; closed ledgers store `addVL(tx)||addVL(meta)` and produce `TxMeta` on apply. +- Trust-line `sfBalance` is always stored "low account's perspective"; helpers negate when querying from high side. The `sfLowLimit.account < sfHighLimit.account` ordering is the trust-line orientation invariant — every access decides which side is "us" via `AccountID` comparison. +- Directory invariant: page keys are chosen so the low 96 bits of every token in an NFT page are strictly less than the page key's low 96 bits; for owner/order-book directories, page 0 is the anchor and `sfIndexPrevious` on root points to the tail. +- `PendingSaves` invariant: exactly one of `saveLedgerAsync`/`pendSaveValidated` may run for a given ledger sequence at a time; second caller observes `started == true` and bails. +- `ApplyStateTable` pointer-identity invariant: `erase(sle)` and `update(sle)` require the exact same `shared_ptr` obtained from `peek()` on the same view instance. Crossing views is a `LogicError`. +- `RawStateTable` state-machine collapse: `erase` after `insert` removes the entry entirely (net zero); `insert` after `erase` upgrades to `replace`; double-erase is a `LogicError`. ## Common Bug Patterns -- Modifying a ledger after `setImmutable()` corrupts shared state; always check `mImmutable` before mutation -- Gap detection: if ledgers 603 and 600 exist but 601-602 are missing, `LedgerMaster` requests 602 first, then backfills 601 -- `InboundLedger::gotData()` queues data for processing; calling `done()` before all data arrives creates incomplete ledgers -- `checkAccept` won't accept a ledger that isn't ahead of the last validated ledger; stale validations are silently ignored -- Calling `ApplyViewImpl::apply()` twice or using the view after apply: the only valid operation post-apply is destruction -- Passing an SLE from `peek()` on view A to `erase()`/`update()` on view B: `ApplyStateTable` enforces pointer identity and `LogicError`s -- Forgetting that `read()` is change-aware but `slesBegin/End` iterates only the base — pending inserts won't appear in SLE iteration on `ApplyViewBase` -- Comparing iterators across different `ReadView` instances: `XRPL_ASSERT` fires in debug; UB in release -- Stale `OpenView::txCount` ordinal in nested/batch views — must use `batch_view_t` constructor to capture `baseTxCount_` -- Calling `removeExpired`/`deleteSLE` in preclaim — preclaim is `ReadView`-only; expiry-driven deletion only happens in doApply -- Forgetting that `directSendNoFee` is not `[[nodiscard]]` (for `DirectStep.cpp` compatibility) — its return must still be inspected +- Modifying a ledger after `setImmutable()` corrupts shared state; always check `mImmutable` before mutation. +- Gap detection: if ledgers 603 and 600 exist but 601-602 are missing, `LedgerMaster` requests 602 first, then backfills 601. +- `InboundLedger::gotData()` queues data for processing; calling `done()` before all data arrives creates incomplete ledgers. +- `checkAccept` won't accept a ledger that isn't ahead of the last validated ledger; stale validations are silently ignored. +- Calling `ApplyViewImpl::apply()` twice or using the view after apply: the only valid operation post-apply is destruction. +- Passing an SLE from `peek()` on view A to `erase()`/`update()` on view B: `ApplyStateTable` enforces pointer identity and `LogicError`s. +- Forgetting that `read()` is change-aware but `slesBegin/End` iterates only the base — pending inserts won't appear in SLE iteration on `ApplyViewBase`. +- Comparing iterators across different `ReadView` instances: `XRPL_ASSERT` fires in debug; UB in release. +- Stale `OpenView::txCount` ordinal in nested/batch views — must use `batch_view_t` constructor to capture `baseTxCount_`. +- Calling `removeExpired`/`deleteSLE` in preclaim — preclaim is `ReadView`-only; expiry-driven deletion only happens in doApply. +- Forgetting that `directSendNoFee` is not `[[nodiscard]]` (for `DirectStep.cpp` compatibility) — its return must still be inspected. +- Accessing trust-line endpoints by raw low/high without first computing orientation — use the trust-line helpers that take `(account, peer)` and resolve which slot to touch. +- Constructing a `PaymentSandbox` from `ApplyView&` when nested within another payment: the nested form takes `PaymentSandbox const*` so `DeferredCredits` chain to the parent. Wrong constructor = double-spend escape. +- Updating an SLE then publishing subscriptions before `havePublished` advances: re-entrant publishers see partial state. +- Submitting to `OrderBookDB` while holding `mLock` from a callback — `setup_` ingests under its own write lock and may re-enter. +- `AcceptedLedgerTx` constructor asserts `!ledger->open()` — constructing from an open ledger aborts in debug. +- Calling `closeChannel` before `sfAmount >= sfBalance` invariant holds — the `XRPL_ASSERT` inside will abort; fix the calling transactor, not the helper. +- `ReadView` copy/move constructors always re-initialize `sles(*this)` and `txs(*this)` — subclasses must not rely on memberwise copy of those range objects. +- `forEachItemAfter` hint-page optimization: if the hint is stale the code falls back to linear scan but still returns `false` if `after` key is never found; callers must handle this as an invalid cursor. +- `dirIsEmpty` requires both empty `sfIndexes` *and* zero `sfIndexNext` — an empty root page does not mean an empty directory if subsequent pages exist. ## Ledger Entry Types -- Defined in `ledger_entries.macro` using `LEDGER_ENTRY(type, code, class, name, fields)` -- Each entry has an `SOTemplate` defining required/optional fields -- Key computation: `Indexes.cpp` computes unique keys (keylets) for each ledger object type -- `STLedgerEntry` wraps the serialized data with type-safe field access -- Pseudo-account types (AMM, Vault, LoanBroker) are discovered by scanning `ltACCOUNT_ROOT` SOTemplate for `SField::sMD_PseudoAccount`-flagged fields; no manual registration +- Defined in `ledger_entries.macro` using `LEDGER_ENTRY(type, code, class, name, fields)`. +- Each entry has an `SOTemplate` defining required/optional fields. +- Key computation: `Indexes.cpp` computes unique keys (keylets) for each ledger object type. +- `STLedgerEntry` wraps the serialized data with type-safe field access. +- Pseudo-account types (AMM, Vault, LoanBroker) are discovered by scanning `ltACCOUNT_ROOT` SOTemplate for `SField::sMD_PseudoAccount`-flagged fields; no manual registration. ## View Hierarchy @@ -48,21 +60,23 @@ ReadView (abstract, read-only) └── PaymentSandbox (overrides credit/balance hooks; DeferredCredits) ``` -- `ApplyStateTable` (per-tx buffer): actions `cache`/`insert`/`modify`/`erase`; generates `TxMeta` with `sfPreviousFields`/`sfFinalFields`/`sfNewFields` driven by `SField::sMD_*` flags; threads `sfPreviousTxnID`/`sfPreviousTxnLgrSeq` on affected account roots and trust-line endpoints -- `RawStateTable` (used by `OpenView` and `RawStateTable::apply` flush): three actions only; state-machine collapse (insert+erase → removed; insert+replace → insert with new SLE; erase+insert → replace) -- Both tables use `boost::container::pmr::monotonic_buffer_resource` with a 256 KB initial arena; `unique_ptr` for stable address so map allocators work after move -- `CachedView` (`CachedLedger = CachedView`): two-level cache — per-view `map_` plus process-wide `CachedSLEs` (`TaggedCache`) keyed by digest. Hit/hitExpired/miss counters distinguish full hit, digest-known-but-SLE-evicted, and cold miss -- Hooks pattern: `balanceHookIOU/MPT`, `ownerCountHook` (read side, on `ReadView`) and `creditHookIOU/MPT`, `adjustOwnerCountHook`, `issuerSelfDebitHookMPT` (write side, on `ApplyView`) are no-ops by default; `PaymentSandbox` overrides them to prevent within-payment double-spend +- `ApplyStateTable` (per-tx buffer): actions `cache`/`insert`/`modify`/`erase`; generates `TxMeta` with `sfPreviousFields`/`sfFinalFields`/`sfNewFields` driven by `SField::sMD_*` flags; threads `sfPreviousTxnID`/`sfPreviousTxnLgrSeq` on affected account roots and trust-line endpoints. On `apply()` the buffer is flushed into the base `RawView` and the table is reset; using the table afterward is UB. A byte-equal `sfModifiedNode` is silently omitted from metadata. +- `RawStateTable` (used by `OpenView` and `RawStateTable::apply` flush): three actions only; state-machine collapse — `insert + erase → removed entirely`; `insert + replace → insert with new SLE`; `erase + insert (modify) → replace`. No `TxMeta` is produced because `RawView` is the post-apply mutation surface. +- Both tables use `boost::container::pmr::monotonic_buffer_resource` with a 256 KB initial arena; `unique_ptr` for stable address so map allocators work after move. Memory is released only when the table is destroyed — long-lived tables leak peak working set. +- `ReadViewFwdRange` is the iterator/range adapter used for `sles`/`txs` traversal; iterator equality is anchored to the owning view (`XRPL_ASSERT` cross-view comparisons in debug). Virtual clone via `iter_base::copy()` enables value-semantics copying. Deferred dereference caches the result in `mutable std::optional cache_`; `operator++` clears the cache. +- `CachedView` (`CachedLedger = CachedView`): two-level cache — per-view `map_` plus process-wide `CachedSLEs` (`TaggedCache`) keyed by digest. The per-view map uses `hardened_hash` to defeat hash-flooding from adversarial keylet sequences. Hit/hitExpired/miss counters distinguish full hit, digest-known-but-SLE-evicted, and cold miss. The expensive `base_.digest()` and `base_.read()` calls happen outside the lock; `mutex_` protects only the key→digest map. +- Hooks pattern: `balanceHookIOU/MPT`, `ownerCountHook` (read side, on `ReadView`) and `creditHookIOU/MPT`, `adjustOwnerCountHook`, `issuerSelfDebitHookMPT` (write side, on `ApplyView`) are no-ops by default; `PaymentSandbox` overrides them to prevent within-payment double-spend. +- `isDryRun` path in `apply(OpenView&...)`: full `TxMeta` is built and returned as `std::optional`, but state changes and `rawTxInsert` are suppressed. Used for fee simulation. ## Directory Structures Three distinct paged-list flavors, all `ltDIR_NODE`-based: -- **Owner / book directories** (`ApplyView::dirInsert`/`dirAppend`/`dirRemove`/`dirDelete`): root at page 0; `sfIndexNext`/`sfIndexPrevious` linked; root's `sfIndexPrevious` points to tail for O(1) append. `dirAppend` preserves insertion order (offers only, asserted); `dirInsert` keeps sorted order within each page. Page overflow detected via deliberate `uint64_t` wraparound (compile-time `static_assert`ed). -- **NFToken pages** (`NFTokenHelpers`): tokens packed into `STArray`-bearing pages, sorted by `compareTokens()` (low 96 bits, then full ID). Last page anchored at `keylet::nftpage_max(owner)`. Split algorithm respects equivalence groups; merge across adjacent pages on remove. `fixNFTokenPageLinks` amendment changes empty-last-page handling. +- **Owner / book directories** (`ApplyView::dirInsert`/`dirAppend`/`dirRemove`/`dirDelete`): root at page 0; `sfIndexNext`/`sfIndexPrevious` linked; root's `sfIndexPrevious` points to tail for O(1) append. `dirAppend` preserves insertion order (offers only, asserted); `dirInsert` keeps sorted order within each page. Page overflow detected via deliberate `uint64_t` wraparound (compile-time `static_assert`ed). `describe` callback brands each new page with type-specific fields (e.g., `sfOwner`). +- **NFToken pages** (`NFTokenHelpers`): tokens packed into `STArray`-bearing pages, sorted by `compareTokens()` (low 96 bits, then full ID). Last page anchored at `keylet::nftpage_max(owner)`. Split algorithm respects equivalence groups (identical low 96 bits); merge across adjacent pages on remove. `fixNFTokenPageLinks` amendment changes empty-last-page handling. - **Quality-keyed order books** (`BookDirs`): two-level — `succ()` finds next quality directory in `[root_, getQualityNext(root_))`, then `cdirFirst`/`cdirNext` walks pages within that quality. `BookDirs` iterator transparently crosses quality boundaries. -The `Dir` class is a simple range adaptor (NFTokenOffer directories + unit tests). `next_page()` is exposed publicly to allow page-skipping traversal (used by `notTooManyOffers`). +The `Dir` class is a simple range adaptor (NFTokenOffer directories + unit tests); `next_page()` is public to allow page-skipping traversal (used by `notTooManyOffers`). ## Helper Module (`include/xrpl/ledger/helpers/`) @@ -75,7 +89,7 @@ Conventional split: Key files: `AMMHelpers`, `AccountRootHelpers`, `CredentialHelpers`, `DelegateHelpers`, `DirectoryHelpers`, `EscrowHelpers`, `MPTokenHelpers`, `NFTokenHelpers`, `OfferHelpers`, `PaymentChannelHelpers`, `PermissionedDEXHelpers`, `RippleStateHelpers`, `TokenHelpers`, `VaultHelpers`. -Policy enums (used to avoid bare bools): `FreezeHandling`, `AuthHandling`, `SpendableHandling`, `WaiveTransferFee`, `AllowMPTOverflow`, `AuthType` (`StrongAuth`/`WeakAuth`/`Legacy` — Legacy maps to StrongAuth for MPT, WeakAuth for IOU). +Policy enums (used to avoid bare bools): `FreezeHandling`, `AuthHandling`, `SpendableHandling`, `WaiveTransferFee`, `AllowMPTOverflow`, `AuthType` (`StrongAuth`/`WeakAuth`/`Legacy` — Legacy maps to StrongAuth for MPT, WeakAuth for IOU), `TruncateShares`. ## AMM Rounding Contract @@ -92,17 +106,43 @@ The pool invariant `sqrt(asset1 × asset2) >= LPTokenBalance` is non-negotiable, `changeSpotPriceQuality`: aligns AMM synthetic offer to CLOB best quality. Solves a quadratic (or linear, for the alternate constraint) and takes the smaller binding result. `fixAMMv1_1` switched the starting side to always-XRP-first to avoid XRP-drop discretization undershoot. `detail::reduceOffer` applies 0.01% rescue multiplier when quality still falls below target. +`solveQuadraticEqSmallest()` uses the numerically stable "citardauq" formula (Blinn's paper) to avoid catastrophic cancellation when `b > 0` in the standard form. + ## MPT Specifics - `OutstandingAmount` can transiently exceed `MaximumAmount` during payment-engine routing — `AllowMPTOverflow::Yes` raises the ceiling to `UINT64_MAX` for that case; direct sends use `No` and strict cap. The `fixSecurity3_1_3` amendment makes `accountSendMulti` accumulate in exact `uint64_t` (not `STAmount`/`Number`) to avoid 19-digit precision loss in aggregate overflow checks. - `selfDebit` field on `IssuerValueMPT` in `PaymentSandbox::DeferredCredits` tracks issuer-as-seller offers because the payment engine credits first; `balanceHookSelfIssueMPT` caps available issuance at `origBalance - selfDebit`. -- `enforceMPTokenAuthorization` (doApply) handles the case where a domain-authorized holder lacks an `MPToken` SLE in preclaim — it lazily allocates the SLE on the fly and consumes the `priorBalance` reserve. +- Two-phase auth: `requireAuth` (preclaim, `ReadView`) checks the static authorization predicates; `enforceMPTokenAuthorization` (doApply, `ApplyView`) handles the case where a domain-authorized holder lacks an `MPToken` SLE in preclaim — it lazily allocates the SLE on the fly and consumes the `priorBalance` reserve. - `lockEscrowMPT` does NOT change `OutstandingAmount` (tokens are still in circulation while escrowed); `unlockEscrowMPT` decreases `OutstandingAmount` only by the fee delta (gross - net) under `fixTokenEscrowV1`. +- `isVaultPseudoAccountFrozen()` recursively checks whether a vault-backed MPT issuance is frozen by the vault's underlying asset; a `depth` parameter bounds recursion (purely defensive — nested vaults cannot currently be created). + +## IOU Trust-Line Specifics + +- Orientation: every trust line stores its endpoints in fixed `sfLowLimit`/`sfHighLimit` slots based on `AccountID` comparison. Helpers like `accountHolds`, `accountFunds`, `trustCreate`, `trustDelete` take `(account, peer)` and resolve the slot internally — never index by raw low/high outside the helpers. +- Three freeze tiers: `isIndividualFrozen` (issuer froze this specific holder), `isFrozen` (global freeze flag), `isDeepFrozen` (transitive freeze through deep-freeze chains). Each has distinct policy implications for sends, offers, and AMM participation; helper queries take a `FreezeHandling` enum to select policy. +- `rippleCredit`/`rippleSend` apply transfer fees only when the issuer is neither endpoint and `WaiveTransferFee::No`; reserve/quality limits enforced via `accountFunds` rather than raw balance. +- Trust-line deletion (`trustDelete`) requires both endpoints to be at default state (zero balance, default limits/flags); helpers compute "default" against the post-tx state, not raw fields. +- `updateTrustLine` (static helper in `RippleStateHelpers.cpp`): when a sender's balance crosses zero and meets seven specific conditions (zero limit, zero quality flags, no freeze, etc.), it releases the sender's reserve and signals the caller to delete the line. + +## Credential System + +- Two-phase enforcement mirrors the `ReadView`/`ApplyView` split: `credentials::valid()` and `credentials::validDomain()` are read-only (preclaim); `verifyDepositPreauth()` and `verifyValidDomain()` are mutating (doApply) and delete expired credential objects as a side effect. +- `credentials::deleteSLE()` handles two owner directories (issuer and subject) with reserve accounting that shifts ownership at `lsfAccepted` time. Before acceptance only the issuer pays the reserve; after, the subject does. +- `checkFields()` validates `sfCredentialIDs` (`STVector256`); `checkArray()` validates `STArray` credential pairs and uses `sha512Half(issuer, credentialType)` for duplicate detection. +- `authorizedDepositPreauth()` uses a `lifeExtender` vector to keep `SLE const` shared pointers alive while the `Slice`-based set is in scope — forgetting this causes dangling pointer reads. ## Pseudo-Accounts Synthetic `AccountRoot` SLEs owned by protocol objects (AMM, Vault, LoanBroker). Address derived from `sha512Half(attempt, parentHash, ownerKey)` → RIPESHA in a loop up to `maxAccountAttempts = 256` (consensus-critical constant). Flags `lsfDisableMaster | lsfDefaultRipple | lsfDepositAuth`; `sfSequence = 0` under `featureSingleAssetVault`/`featureLendingProtocol`. `isPseudoAccount(sle, filter?)` checks for any field tagged `SField::sMD_PseudoAccount` (currently `sfAMMID`, `sfVaultID`, `sfLoanBrokerID`). Pseudo-accounts bypass reserve requirements in `xrpLiquid`. +Amendment checks are the *caller's* responsibility, not `createPseudoAccount`'s. The function asserts the passed `ownerField` carries `sMD_PseudoAccount`. + +## Vault Math + +`VaultHelpers` converts between asset and share denominations using `sfScale` (decimal precision) and tracks `sfLossUnrealized` separately on deposit vs withdraw paths — the asymmetry is intentional: deposits price against gross `sfAssetsTotal`; withdrawals subtract `sfLossUnrealized` first (existing holders bear the loss, not new depositors). `TruncateShares` policy controls whether sub-unit share remainders are rounded to zero or rejected. + +Empty-vault bootstrap: when `sfAssetsTotal == 0`, initial share allocation is `assets × 10^sfScale` (truncated), establishing the starting exchange rate. This reduces susceptibility to the first-depositor donation attack. + ## Ledger Timing (`LedgerTiming.h`) Adaptive close-time binning prevents clock-skew disagreements. Resolutions: `{10, 20, 30, 60, 90, 120}` seconds; default 30, genesis 10. Adjustment is asymmetric: @@ -128,14 +168,52 @@ Retry queue between consensus passes. Sort key: `(account ⊕ salt, SeqProxy, tx `Key::operator==` compares only `txId_` (asymmetric with `operator<`). +## Subscription Fan-Out + +`BookListeners` and `OrderBookDB` together drive WebSocket book-stream subscriptions: + +- `BookListeners`: per-book (`Book`-keyed) registry of `InfoSub::wptr`. `publish` builds a `MultiApiJson` once and dispatches by API version, expiring dead weak pointers in-place. Locking is per-listener-set (`std::recursive_mutex`), not global. +- `OrderBookDB`: scans the ledger's `dirNode` index to build `xrpBooks_` (XRP-side, O(1) checked) and `allBooks_` (IOU/MPT, scanned). Scans are throttled by `setup_seq_` — only re-scan on ledger change. `publishOrderBook` walks affected listeners; `havePublished` dedups so the same ledger isn't re-fanned-out across overlapping subscription updates. +- Ingest path under `mLock` (write); subscribe/unsubscribe under read lock. Callbacks must not re-enter `setup_`. +- `havePublished` is a `hash_set` shared across all `BookListeners::publish()` calls for a single transaction, preventing duplicate delivery to subscribers registered on multiple affected books. + +## Accepted Ledger Tx (`AcceptedLedgerTx`) + +Eagerly-materialized projection of a transaction-in-ledger: tx, meta, deserialized account fields, affected-accounts list, JSON-serialization cache. Construction is the heavyweight step; downstream consumers (RPC, subscription) read fields by reference. The class is immutable post-construction; thread-safe to share. + +Constructor asserts `!ledger->open()`. For `ttOFFER_CREATE` transactions where `account != amount.getIssuer()`, `owner_funds` is injected into `mJson` via `accountFunds(fhIGNORE_FREEZE, ahIGNORE_AUTH)` — a read-time annotation for order-book subscribers, not ledger state. + +`getEscMeta()` returns `mRawMeta` as a SQL blob literal; used by `Node.cpp` for transaction DB inserts. + +## Pending Saves (`PendingSaves`) + +State machine tracking in-flight `Ledger` → DB writes. Each entry keyed by ledger sequence; transitions `requested → started → finished`. `pendSave` is idempotent — second caller for the same sequence is a no-op. Synchronous callers in `shouldWork(seq, true)` block on a `condition_variable` until the in-progress write completes. `getSnapshot()` returns a copy of the map; `LedgerMaster::getValidatedRange()` uses it to exclude in-progress sequences from the reported range. + +## Amendment Table (`AmendmentTable`) + +Governance state for protocol amendments. Each amendment has a `VoteBehavior`: +- `DefaultYes` — vote yes unless config overrides +- `DefaultNo` — vote no unless config overrides +- `Obsolete` — never vote yes, even with config override; `LogicError` if config tries + +`TrustedVotes` tracks per-validator amendment votes across ledgers with anti-flapping (a validator must consistently vote for N consecutive flag ledgers before counting toward majority). The flag-ledger cadence and majority threshold are consensus-critical constants. `doVoting` produces the `EnableAmendment`/`Veto` pseudo-transactions inserted at flag ledger close. + +Two-layer API: the pure-virtual `doVoting(LedgerIndex, set, majorityAmendments_t)` is wrapped by a non-virtual `doVoting(shared_ptr, ...)` that calls `getEnabledAmendments()` and `getMajorityAmendments()` from `View.h`. This decouples the implementation from ledger view types. + +`needValidatedLedger(seq)` is an optimization gate — most ledgers have no effect on amendment voting; only flag ledgers need the full `doValidatedLedger` pass. + ## Review Checklist -- New ledger entry types: add to `ledger_entries.macro`, implement keylet in `Indexes.cpp`, verify acquisition code in `InboundLedger`/`LedgerMaster`, check `LedgerCleaner` handling +- New ledger entry types: add to `ledger_entries.macro`, implement keylet in `Indexes.cpp`, verify acquisition code in `InboundLedger`/`LedgerMaster`, check `LedgerCleaner` handling. - New pseudo-account types: tag the key field with `SField::sMD_PseudoAccount` in `sfields.macro`; `isPseudoAccount` picks it up automatically. Caller, not `createPseudoAccount`, owns the amendment gate. - New asset operations: extend `Asset` variant + add branches in `TokenHelpers.h` dispatchers. Don't reach into IOU- or MPT-specific helpers directly unless intentionally bypassing the dispatch layer. -- Helper functions touching balances: respect the read/write hook protocol so `PaymentSandbox` correctly defers credits -- AMM math changes: gate behind an amendment; preserve the pre-amendment formula path -- Directory changes: account for both legacy (unsorted) and modern pages in iteration; verify `cdirNext`/`dirNext` cursor semantics if deleting during iteration (see `cleanupOnAccountDelete` workaround) +- Helper functions touching balances: respect the read/write hook protocol so `PaymentSandbox` correctly defers credits. +- AMM math changes: gate behind an amendment; preserve the pre-amendment formula path. +- Directory changes: account for both legacy (unsorted) and modern pages in iteration; verify `cdirNext`/`dirNext` cursor semantics if deleting during iteration (see `cleanupOnAccountDelete` workaround in `View.cpp` around line 485). +- New amendment: register `VoteBehavior`; if `Obsolete`, ensure no config path can flip it; add to `TrustedVotes` accounting if it should be majority-tracked. +- New subscription stream: if order-book-related, register with `BookListeners`; for ledger-wide streams use `OrderBookDB::havePublished` to dedup. +- Escrow with non-XRP assets: use `escrowUnlockApplyHelper` (template specialized for `Issue` / `MPTIssue`); check `createAsset` flag gating and `lockedRate` capping logic. +- Delegate transactions: call `checkTxPermission` first; only call `loadGranularPermission` when broad permission check fails. ## Key Patterns @@ -191,6 +269,31 @@ auto const tokens = getRoundedLPTokens( IsDeposit::Yes); // → Number::downward ``` +### Nested PaymentSandbox +```cpp +// Inside a payment step that needs its own sandbox: chain DeferredCredits +PaymentSandbox inner(&outer); // PaymentSandbox const* form +// ... inner.creditHookIOU(...) defers against the outer's table too +inner.apply(outer); // flushes deferred credits up one level +``` + +### Subscription Fan-Out +```cpp +// publish once, dispatch by version; havePublished shared across all books +MultiApiJson msg = buildBookUpdate(...); +listeners.publish(msg, havePublished); +db.havePublished(ledgerSeq); // dedup across overlapping streams +``` + +### Sandbox Commit-or-Discard +```cpp +Sandbox sb(&ctx_.view()); +// ... all mutations through sb ... +if (result == tesSUCCESS) + sb.apply(ctx_.rawView()); +// else sb destroyed → changes evaporate +``` + ## Key Files - `src/xrpld/app/ledger/Ledger.h` / `src/libxrpl/ledger/Ledger.cpp` — ledger class, genesis/successor/load constructors, immutable transition, skip list, NegUNL @@ -198,11 +301,20 @@ auto const tokens = getRoundedLPTokens( - `src/xrpld/app/ledger/detail/InboundLedger.cpp` — ledger acquisition - `include/xrpl/protocol/detail/ledger_entries.macro` — entry type definitions - `src/libxrpl/protocol/Indexes.cpp` — keylet computation -- `include/xrpl/ledger/ReadView.h` + `ApplyView.h` + `OpenView.h` — view interface hierarchy -- `include/xrpl/ledger/detail/ApplyStateTable.h` / `RawStateTable.h` — buffered mutation tables + TxMeta generation +- `include/xrpl/ledger/ReadView.h` + `ApplyView.h` + `OpenView.h` + `RawView.h` — view interface hierarchy +- `include/xrpl/ledger/detail/ApplyStateTable.h` / `RawStateTable.h` / `ApplyViewBase.h` / `ReadViewFwdRange.h` + `.ipp` — buffered mutation tables, range adapters, `TxMeta` generation - `include/xrpl/ledger/Sandbox.h` / `PaymentSandbox.h` / `ApplyViewImpl.h` — concrete view types -- `include/xrpl/ledger/CachedView.h` / `CachedSLEs.h` — two-level SLE cache +- `include/xrpl/ledger/CachedView.h` / `CachedSLEs.h` — two-level SLE cache (hardened-hash inner map) - `include/xrpl/ledger/CanonicalTXSet.h` — retry-pass deterministic ordering - `include/xrpl/ledger/LedgerTiming.h` — close-time binning +- `include/xrpl/ledger/Dir.h` / `BookDirs.h` — directory iteration +- `include/xrpl/ledger/BookListeners.h` / `OrderBookDB.h` — subscription fan-out +- `include/xrpl/ledger/AcceptedLedgerTx.h` — eager tx-in-ledger projection +- `include/xrpl/ledger/AmendmentTable.h` — governance state, `VoteBehavior`, `TrustedVotes` +- `include/xrpl/ledger/PendingSaves.h` — in-flight DB write coalescing +- `include/xrpl/ledger/View.h` — free-function utility layer (expiry, `hashOfSeq`, `areCompatible`, `dirLink`, `canWithdraw`, `cleanupOnAccountDelete`) - `include/xrpl/ledger/helpers/*` — per-object-type free functions -- `src/libxrpl/ledger/helpers/*` — implementations (AMM math, NFT page split, credential lifecycle, MPT overflow) +- `src/libxrpl/ledger/helpers/*` — implementations (AMM math, NFT page split, credential lifecycle, MPT overflow, vault math) +- `src/libxrpl/ledger/ApplyView.cpp` — directory `createRoot`, `findPreviousPage`, `insertKey`, `insertPage` (in `namespace xrpl::directory`) +- `src/libxrpl/ledger/PaymentSandbox.cpp` — `DeferredCredits` IOU/MPT shadow tables, post-switchover balance algorithm +- `src/libxrpl/ledger/OpenView.cpp` — `RawStateTable` + `txs_map` delta accumulation, PMR arena \ No newline at end of file diff --git a/docs/skills/nodestore.md b/docs/skills/nodestore.md index 548a5235b2..9912e6d614 100644 --- a/docs/skills/nodestore.md +++ b/docs/skills/nodestore.md @@ -6,7 +6,7 @@ Persistent key-value store for `NodeObject`s (ledger entries). Every piece of le Four layers, each with a narrow contract: -1. **`NodeObject`** (`include/xrpl/nodestore/NodeObject.h`) — immutable (type, hash, blob). Constructed only via `createObject()` factory; the `PrivateAccess` tag struct makes the public constructor effectively private while remaining compatible with `std::make_shared`. Hash is *not* verified against data — trust the caller. +1. **`NodeObject`** (`include/xrpl/nodestore/NodeObject.h`) — immutable (type, hash, blob). Constructed only via `createObject()` factory; the `PrivateAccess` tag struct makes the public constructor effectively private while remaining compatible with `std::make_shared`. Hash is *not* verified against data — trust the caller. Inherits `CountedObject` for live-instance diagnostics. 2. **`Backend`** (`include/xrpl/nodestore/Backend.h`) — pure abstract key/value interface. `fetch`/`store` are concurrent; `storeBatch`/`for_each` are not. Two-phase init: construct, then `open()`. 3. **`Database`** (`include/xrpl/nodestore/Database.h`) — owns the async read pool and stats; defines pure-virtual `fetchNodeObject(hash, seq, FetchReport&, duplicate)`. Public non-virtual `fetchNodeObject` instruments the private virtual one (timing, hit/miss, scheduler callback) — subclasses cannot bypass metrics. 4. **`Manager`** (`include/xrpl/nodestore/Manager.h`) — singleton registry mapping config `type=` strings to `Factory` instances. `make_Backend()` and `make_Database()` are the construction entry points. @@ -16,6 +16,7 @@ Two concrete `Database` subclasses: `DatabaseNodeImp` (single backend) and `Data ## Key Invariants - `NodeObjectType` values: `hotUNKNOWN=0`, `hotLEDGER=1`, `hotACCOUNT_NODE=3`, `hotTRANSACTION_NODE=4`, `hotDUMMY=512`. Value 2 is a historical gap. `hotDUMMY` is deliberately outside the contiguous range so it cannot collide with valid types — used as a cache sentinel meaning "confirmed missing." +- `NodeObject` lives in the `xrpl` namespace (not `xrpl::NodeStore`) because it is consumed broadly by SHAMap, ledger, and serialization layers. - Preferred backends: NuDB (append-mostly, default) and RocksDB; Memory and Null are for tests / configured ephemerality. - `Database` instrumentation is structural: the public `fetchNodeObject()` measures elapsed time, increments atomic counters, and calls `scheduler_.onFetch()` around every private virtual call. - `Backend::fetch` and `Backend::store` are called concurrently from many threads; `storeBatch` and `for_each` are not. Implementations must reflect this. @@ -23,6 +24,7 @@ Two concrete `Database` subclasses: `DatabaseNodeImp` (single backend) and `Data - Batch writes accumulate up to `batchWriteLimitSize = 65536` objects; peak in-flight memory can be ~2× this because a new batch accumulates while the previous one is being flushed (`Types.h`). - `EncodedBlob` / `DecodedBlob` define the on-disk format: bytes 0–7 zero-padded (legacy ledger-index field), byte 8 = `NodeObjectType`, bytes 9+ = payload. Both must change together. - `NuDBBackend::fdRequired()` returns 3 (data, key, log files). `RocksDBBackend::fdRequired()` returns `max_open_files + 128`. `DatabaseRotatingImp` sums both backends' values. +- `storeStats()` asserts `count <= sz` — byte total must be ≥ item count, guards against accounting bugs in subclasses. ## On-Disk Format @@ -36,8 +38,8 @@ Bytes 9+ Raw serialized payload The 32-byte hash is the storage key, kept separate from the value. -- `EncodedBlob` embeds a 1033-byte stack buffer (`payload_`) sized for header + 1024-byte payload (most objects). Only blobs exceeding 1024 bytes heap-allocate. `ptr_` is `uint8_t* const` set at construction; destructor frees iff `ptr_ != payload_.data()`. ~94% of real objects fit the stack buffer. -- `DecodedBlob` is a non-owning view into the raw buffer. Validation is by `wasOk()` flag, not exceptions. `createObject()` asserts `m_success` and copies the payload into an owning `Blob` for the returned `NodeObject`. +- `EncodedBlob` embeds a 1033-byte stack buffer (`payload_`) sized for header + 1024-byte payload (most objects). Only blobs exceeding 1024 bytes heap-allocate. `ptr_` is `uint8_t* const` set at construction; destructor frees iff `ptr_ != payload_.data()`. ~94% of real objects fit the stack buffer. The destructor `XRPL_ASSERT` verifies pointer/size coherence to catch any drift. +- `DecodedBlob` is a non-owning view into the raw buffer. Validation is by `wasOk()` flag, not exceptions. `createObject()` asserts `m_success` and copies the payload into an owning `Blob` for the returned `NodeObject`. `hotDUMMY` (512) falls through the type `switch` and leaves `m_success = false`. NuDB additionally runs the encoded blob through `nodeobject_compress` (see Compression below). @@ -54,9 +56,11 @@ NuDB-only. Four type tags prefix every stored blob (as a varint): **Inner-node fast path**: SHAMap inner nodes are exactly 525 bytes with `HashPrefix::innerNode` at offset 9. The compressor recognizes this and either packs only non-zero child hashes with a 16-bit presence bitmask (type 2) or stores all 16 hashes contiguously (type 3). Reconstruction zeros the `index`, `unused`, and `kind` fields — so a round-trip is *lossy* on those fields. `filter_inner()` pre-zeros them on the source side to make import-time `memcmp` verification succeed. -**Varint** (`varint.h`): base-127 (not base-128) LEB-style encoding; bit 7 is continuation. Used for the type tag and the LZ4 decompressed-size prefix. The base-127 choice means `0x7F` never appears as a payload byte — minor space cost, no practical impact. +**LZ4 path**: `lz4_compress` stores a varint decompressed-size prefix then LZ4 payload. `lz4_decompress` validates the varint (overflow checked before `static_cast`) then pre-allocates the exact output buffer before calling `LZ4_decompress_safe`. All size mismatches throw `std::runtime_error`. -**BufferFactory pattern**: All codec functions take a callable `void*(size_t)` so allocation policy stays with the caller (NuDB passes its scratch buffer). +**Varint** (`varint.h`): base-127 (not base-128) LEB-style encoding; bit 7 is continuation. Used for the type tag and the LZ4 decompressed-size prefix. The base-127 choice means `0x7F` never appears as a payload byte. Functions are function templates (not plain functions) to satisfy ODR across TUs. `read_varint` returns 0 on empty buffer, overrun, or overflow. + +**BufferFactory pattern**: All codec functions take a callable `void*(size_t)` so allocation policy stays with the caller (NuDB passes its scratch buffer). Codecs never free memory; the factory object's lifetime governs cleanup. ## Async Read Pool (`Database`) @@ -64,29 +68,41 @@ The base class spawns `readThreads` detached threads at construction. Each loops **Hash coalescing**: Multiple concurrent `asyncFetch()` calls for the same hash collapse into a single map entry; one backend read fires all callbacks. For different `seq` values on the same hash, `isSameDB(seq1, seq2)` decides whether to reuse the fetched object or issue a second fetch. -**Batched dequeue**: Each worker extracts up to `requestBundle_` entries per lock acquisition (default 4, clamped 1–64 via `rq_bundle` config) to amortize mutex cost. +**Batched dequeue**: Each worker extracts up to `requestBundle_` entries per lock acquisition (default 4, clamped 1–64 via `rq_bundle` config) to amortize mutex cost. Uses `read_.extract()` (C++17 node-handle) to move entries without copying. -**Shutdown ordering — critical**: Derived classes *must* call `stop()` in their own destructors. The base destructor calls `stop()` as a safety net, but by then the derived vtable is already gone — a worker thread blocked in `fetchNodeObject` would invoke a destroyed vtable entry. `stop()` sets `readStopping_`, clears `read_`, broadcasts the condvar, and spin-yields until `readThreads_` reaches zero (asserted within 30s). +**Threads are detached** (not joined), controlled by `readStopping_` atomic + `readThreads_` counter. `stop()` clears `read_`, broadcasts the condvar, and spin-yields until `readThreads_` reaches zero (asserted within 30s). + +**Shutdown ordering — critical**: Derived classes *must* call `stop()` in their own destructors. The base destructor calls `stop()` as a safety net, but by then the derived vtable is already gone — a worker thread blocked in `fetchNodeObject` would invoke a destroyed vtable entry. See Common Bug Patterns. + +**`asyncFetch()` during shutdown**: Silently discards the request if `isStopping()` is already true — callers during shutdown get no callback. + +**`runningThreads_` vs `readThreads_`**: Workers increment `runningThreads_` on wake and decrement before waiting, so `getCountsJson()` can distinguish actively-processing threads from threads blocked on I/O. `readThreads_` is decremented on thread exit; reaching zero confirms all threads fully exited. + +**Diagnostics**: `getCountsJson()` surfaces queue depth, thread counts, `rq_bundle`, write/read counts, bytes, hit counts, and total read duration (µs) for the `get_counts` RPC. ## BatchWriter (`include/xrpl/nodestore/detail/BatchWriter.h`) Coalesces individual writes into batches for backends that benefit (RocksDB uses it; NuDB does not — NuDB's `do_insert` is synchronous). - Privately inherits `Task`; the backend inherits `BatchWriter::Callback` and provides `writeBatch(Batch const&)`. Same object plays both roles, no extra allocation. -- **Double-buffer swap**: `store()` pushes into `mWriteSet`. `writeBatch()` holds the lock only long enough to swap `mWriteSet` with a fresh local vector, then releases before doing I/O. New stores accumulate concurrently with the flush. +- **Double-buffer swap**: `store()` pushes into `mWriteSet`. `writeBatch()` holds the lock only long enough to swap `mWriteSet` with a fresh local vector, then releases before doing I/O. New stores accumulate concurrently with the flush. After each flush the loop re-checks the buffer before clearing `mWritePending` so no items are dropped. - **`std::recursive_mutex` + `std::condition_variable_any`**: A synchronous scheduler (e.g., `DummyScheduler`) calls `performScheduledTask()` inline on the producer thread, which re-enters `writeBatch` on the same thread — plain `std::mutex` would deadlock. -- **Backpressure**: `store()` blocks on `mWriteCondition` when `mWriteSet.size() >= batchWriteLimitSize` (65536). The double-buffer means peak memory ≈ 2× the limit. -- **`mWritePending` flag**: Ensures only one scheduler task outstanding; the flush loop re-checks the buffer before clearing the flag so no items are silently dropped. +- **Backpressure**: `store()` blocks on `mWriteCondition` when `mWriteSet.size() >= batchWriteLimitSize` (65536). Peak memory ≈ 2× the limit. +- **`mWritePending` flag**: Ensures only one scheduler task outstanding. First `store()` that finds flag clear raises it and calls `scheduleTask()`. +- **`getWriteLoad()`**: Returns `max(mWriteLoad, mWriteSet.size())` — conservative estimate reflecting both in-flight write count and pending accumulation count. - **Destructor** calls `waitForWriting()` — no data is abandoned on backend teardown. +- **Telemetry**: After each flush, records count + wall-clock duration in `BatchWriteReport` and passes to `m_scheduler.onBatchWrite()`. ## Scheduler -Pure abstract (`include/xrpl/nodestore/Scheduler.h`): `scheduleTask(Task&)`, `onFetch(FetchReport)`, `onBatchWrite(BatchWriteReport)`. Two implementations: +Pure abstract (`include/xrpl/nodestore/Scheduler.h`): `scheduleTask(Task&)`, `onFetch(FetchReport)`, `onBatchWrite(BatchWriteReport)`. Contract: scheduler may invoke task on calling thread *or* a foreign thread — both are valid. Two implementations: - **`DummyScheduler`**: `scheduleTask` runs `task.performScheduledTask()` synchronously on the calling thread; `onFetch`/`onBatchWrite` are no-ops. Used by every NodeStore test and by the bulk-import path in `Application.cpp`. - **`NodeStoreScheduler`** (production, in `xrpld/app`): posts a `jtWRITE` job to the `JobQueue` (synchronous fallback if queue is stopped); routes `FetchReport` to `jtNS_SYNC_READ`/`jtNS_ASYNC_READ` load events. -`Task` (`Task.h`) is just a virtual `performScheduledTask()` + virtual destructor. `BatchWriter` inherits it privately so external code cannot treat it as a `Task`. +`Task` (`Task.h`) is just a virtual `performScheduledTask()` + virtual destructor. `BatchWriter` inherits it privately so external code cannot treat it as a `Task`. The recursive mutex in `BatchWriter` is required precisely because `DummyScheduler` can call back synchronously. + +`FetchReport` has a `const FetchType` member set at construction; `elapsed` is zero-initialized. `BatchWriteReport` carries elapsed + writeCount. Both are plain-old-data stack values passed by value. ## Rotating Backend / Online Deletion @@ -99,10 +115,12 @@ Pure abstract (`include/xrpl/nodestore/Scheduler.h`): `scheduleTask(Task&)`, `on Then release the lock and invoke `callback(newWritableName, newArchiveName)`. The callback persists names to the SQL state DB. `oldArchiveBackend` falls out of scope *after* the callback returns — so the directory is deleted only after the state DB knows the new layout. Crash between swap and persist → recoverable from state DB on restart. -**Snapshot-and-release locking**: Every read/write copies the relevant `shared_ptr` under `mutex_`, releases the lock, then does I/O via the local. Locking across disk I/O would serialize all readers. +**Snapshot-and-release locking**: Every read/write copies the relevant `shared_ptr` under `mutex_`, releases the lock, then does I/O via the local. Locking across disk I/O would serialize all readers. Exception: `sync()` holds the lock for the full call (maintenance path, not latency-sensitive). **Fetch fallthrough + promotion**: `fetchNodeObject` tries writable, then archive. If found in archive and `duplicate=true`, re-snapshots `writableBackend_` (to handle a rotation racing with the archive read) and writes the object into the *current* writable. Objects not promoted before the next rotation are gone forever — promotion is the migration mechanism. +**`newWritableBackendName`** is captured *before* the lock (calling `getName()` on the new backend); `newArchiveBackendName` is captured *inside* the lock from the demoted former writable. Both are passed to the callback. + ## Manager / Factory Registration `ManagerImp` is a Meyers singleton (`static ManagerImp _` in `instance()`). Its constructor calls four free functions (`registerNuDBFactory`, `registerRocksDBFactory`, `registerNullFactory`, `registerMemoryFactory`), each of which creates a function-local `static` factory whose constructor calls `Manager::insert(*this)`. @@ -110,23 +128,24 @@ Then release the lock and invoke `callback(newWritableName, newArchiveName)`. Th **Why this pattern**: Factories are never globals. If a global `Factory` destructor called `Manager::instance().erase()` after `ManagerImp` had been destroyed, the result would be UB (no order guarantee across translation units). Function-local statics initialize after `ManagerImp` and destroy before it — safe. - `Manager::find()` is case-insensitive (`boost::iequals`) so `"NuDB"`, `"nudb"`, `"NUDB"` all match. -- `make_Backend()` throws `std::runtime_error` with operator-facing message on missing or unknown `type` key. +- `make_Backend()` throws `std::runtime_error` with operator-facing message on missing or unknown `type` key. Same `missing_backend()` helper covers both the absent-key and unrecognized-name paths. - `make_Database()` = `make_Backend()` + `backend->open()` + wrap in `DatabaseNodeImp`. The explicit `open()` separation lets I/O errors surface before the full `Database` stack is built. -- Registry mutex protects `list_` (a `vector` of non-owning pointers). +- Registry mutex protects `list_` (a `vector` of non-owning pointers). `erase()` uses `XRPL_ASSERT` — removing an unknown factory is a programming error. +- `Factory::createInstance` second overload accepts `nudb::context&` for shared I/O threads across NuDB shards. Non-NuDB backends inherit a default that returns `{}` (null), prompting the caller to fall back to the simpler overload. ## Backend-Specific Notes -**NuDB** (`backend/NuDBFactory.cpp`): Three on-disk files (`nudb.dat`, `nudb.key`, `nudb.log`); `fdRequired()` = 3. `appnum = 1` embedded in the header, sanity-checked on every open. `nudb_block_size` config key must be a power of 2 between 4096 and 32768; defaults to `nudb::block_size()` (filesystem-native). `for_each()` and `verify()` *close and reopen* the database — incompatible with concurrent access. `fetch()` uses a zero-copy callback into NuDB's internal buffer; decompression must happen inside the callback. `db_.insert()` returning `nudb::error::key_exists` is silently ignored (content-addressed: same hash → same data). +**NuDB** (`backend/NuDBFactory.cpp`): Three on-disk files (`nudb.dat`, `nudb.key`, `nudb.log`); `fdRequired()` = 3. `appnum = 1` embedded in the header, sanity-checked on every open. `nudb_block_size` config key must be a power of 2 between 4096 and 32768; defaults to `nudb::block_size()` (filesystem-native). `for_each()` and `verify()` *close and reopen* the database — incompatible with concurrent access. `fetch()` uses a zero-copy callback into NuDB's internal buffer; decompression must happen inside the callback (buffer only valid during callback). `db_.insert()` returning `nudb::error::key_exists` is silently ignored (content-addressed: same hash → same data). `burstSize` set via `db_.set_burst()` after open — important performance parameter. Destructor catches `nudb::system_error` from `close()` because destructors must not propagate exceptions. -**RocksDB** (`backend/RocksDBFactory.cpp`): `RocksDBEnv` overrides `StartThread` to name threads `"rocksdb #N"` for profiler visibility. `hard_set` config flag: when false (default), small `cache_mb`/`open_files` values are silently escalated to production-appropriate defaults (1024 MB cache, 8000 FDs). Implements both `Backend` and `BatchWriter::Callback`; uses `BatchWriter` for writes. `storeBatch` is atomic (single `WriteBatch` in WAL); `fetchBatch` is a serial loop with no atomicity. `sync()` is empty — WAL provides durability. Key passed to RocksDB via `std::bit_cast` over the `uint256` — no copy. +**RocksDB** (`backend/RocksDBFactory.cpp`): `RocksDBEnv` overrides `StartThread` to name threads `"rocksdb #N"` (using an atomic counter) for profiler visibility. `hard_set` config flag: when false (default), small `cache_mb`/`open_files` values are silently escalated to production-appropriate defaults (1024 MB cache, 8000 FDs). Implements both `Backend` and `BatchWriter::Callback`; uses `BatchWriter` for writes. `storeBatch` is atomic (single `WriteBatch` in WAL); `fetchBatch` is a serial loop with no atomicity. `sync()` is empty — WAL provides durability. Key passed to RocksDB via `std::bit_cast` over the `uint256` — no copy. Raw option strings accepted via `bbt_options` and `options` config keys; throw on parse failure. -**Memory** (`backend/MemoryFactory.cpp`): `MemoryFactory` owns named `MemoryDB` instances in a case-insensitive map; multiple `MemoryBackend`s opened with the same path share the same `MemoryDB`. Survives backend close/reopen within a process. The `db.open` guard in `MemoryFactory::open()` is dead code (`open` is never set to `true`). `for_each` reads without holding the mutex — caller must ensure no concurrent writes. +**Memory** (`backend/MemoryFactory.cpp`): `MemoryFactory` owns named `MemoryDB` instances in a case-insensitive (`boost::beast::iless`) map; multiple `MemoryBackend`s opened with the same path share the same `MemoryDB`. Survives backend close/reopen within a process. The `db.open` guard in `MemoryFactory::open()` is dead code (`open` is never set to `true`). `for_each` reads without holding the mutex — caller must ensure no concurrent writes. Module-level raw pointer `memoryFactory` allows `MemoryBackend::open()` to call back without holding a reference. -**Null** (`backend/NullFactory.cpp`): All operations no-op; `fetch` returns `notFound`. Exists so `type=none` is a valid config value. Doubles as a minimal reference implementation of `Backend`. +**Null** (`backend/NullFactory.cpp`): All operations no-op; `fetch` returns `notFound`. Exists so `type=none` is a valid config value. Doubles as a minimal reference implementation of `Backend`. `isOpen()` always returns `false`. ## Common Bug Patterns -- **Forgetting `stop()` in derived `Database` destructor**: see Shutdown ordering above. Symptom is a crash during teardown in a worker thread invoking a destroyed vtable. +- **Forgetting `stop()` in derived `Database` destructor**: Symptom is a crash during teardown in a worker thread invoking a destroyed vtable. The base `~Database()` calls `stop()` as a fallback but the derived vtable is already gone by then. - **Holding `DatabaseRotatingImp::mutex_` across I/O**: Will serialize all readers. Always snapshot the `shared_ptr` under lock, release, then I/O. - **Forgetting `duplicate=true` when archive fallback matters**: Objects fetched from archive are not promoted to writable; next rotation discards them silently. - **Treating `hotDUMMY` as a real object**: Cache lookups must check the type before dereferencing. @@ -135,6 +154,9 @@ Then release the lock and invoke `callback(newWritableName, newArchiveName)`. Th - **Changing `EncodedBlob` without `DecodedBlob`**: Breaks on-disk read compatibility silently — both classes define the format jointly. - **Calling `for_each` / `verify` on NuDB concurrently with reads**: Both close and reopen the database; concurrent access is undefined. - **Lossy inner-node round-trip**: `nodeobject_compress` zeros `index`/`unused`/`kind` on reconstruction. Code that verifies blobs after compress→decompress must call `filter_inner()` first. +- **Not calling `backend->open()` before wrapping in `DatabaseNodeImp`**: `make_Database()` does this correctly but manual construction may miss it; errors surface much later in I/O paths. +- **Using `MemoryBackend::for_each` concurrently**: No lock held; caller must guarantee no concurrent writes (implicit contract, not enforced). +- **`fetchBatch` atomicity assumption on RocksDB**: `storeBatch` is atomic (one `WriteBatch`); `fetchBatch` is a serial loop — no group atomicity on reads. ## Review Checklist @@ -143,39 +165,45 @@ Then release the lock and invoke `callback(newWritableName, newArchiveName)`. Th - New backend implementations: full `Backend` interface including `fdRequired()`, both `open()` overloads (default-throws is OK for non-NuDB), accurate concurrency guarantees on `fetch`/`store`, no-op `verify()` if not implementing. - Derived `Database` subclasses: `stop()` in destructor; override `fetchNodeObject(hash, seq, FetchReport&, duplicate)` not the public non-virtual. - Any change to on-disk format: update both `EncodedBlob` and `DecodedBlob`; consider backward-compat type tag (codec.h type 0 path is the precedent). +- Inner-node codec changes: update `filter_inner()` alongside compressor/decompressor; verify round-trip with zeroed metadata fields. +- New varint usages: confirm base-127 (not base-128) arithmetic; use `varint_traits::max` for stack buffer sizing. ## Key Files ### Public interfaces -- `include/xrpl/nodestore/NodeObject.h` — immutable object, factory-only construction -- `include/xrpl/nodestore/Backend.h` — pluggable storage interface -- `include/xrpl/nodestore/Database.h` — async read pool, instrumented fetch +- `include/xrpl/nodestore/NodeObject.h` — immutable object, factory-only construction, `CountedObject` live-count +- `include/xrpl/nodestore/Backend.h` — pluggable storage interface; concurrency contracts annotated inline +- `include/xrpl/nodestore/Database.h` — async read pool, instrumented fetch, Template Method pattern - `include/xrpl/nodestore/DatabaseRotating.h` — adds `rotate()` - `include/xrpl/nodestore/Manager.h` — singleton factory registry -- `include/xrpl/nodestore/Factory.h` — abstract factory +- `include/xrpl/nodestore/Factory.h` — abstract factory; two `createInstance` overloads (plain + nudb::context) - `include/xrpl/nodestore/Scheduler.h` / `Task.h` — async dispatch + telemetry - `include/xrpl/nodestore/DummyScheduler.h` — synchronous, for tests + import - `include/xrpl/nodestore/Types.h` — `Status`, `Batch`, batch size constants ### Implementations (detail/) -- `include/xrpl/nodestore/detail/DatabaseNodeImp.h` — single-backend -- `include/xrpl/nodestore/detail/DatabaseRotatingImp.h` — rotation + promotion -- `include/xrpl/nodestore/detail/ManagerImp.h` — singleton + registry -- `include/xrpl/nodestore/detail/BatchWriter.h` — write coalescing + backpressure -- `include/xrpl/nodestore/detail/EncodedBlob.h` / `DecodedBlob.h` — on-disk format -- `include/xrpl/nodestore/detail/codec.h` — LZ4 + inner-node compression -- `include/xrpl/nodestore/detail/varint.h` — base-127 varint for codec type tags +- `include/xrpl/nodestore/detail/DatabaseNodeImp.h` — single-backend; `isSameDB` always true; ledgerSeq ignored +- `include/xrpl/nodestore/detail/DatabaseRotatingImp.h` — rotation + promotion; snapshot-and-release locking +- `include/xrpl/nodestore/detail/ManagerImp.h` — singleton + registry; `missing_backend()` static helper +- `include/xrpl/nodestore/detail/BatchWriter.h` — write coalescing + backpressure; recursive mutex +- `include/xrpl/nodestore/detail/EncodedBlob.h` — serializer; 1033-byte stack buffer, heap fallback +- `include/xrpl/nodestore/detail/DecodedBlob.h` — parser; non-owning view, `wasOk()` flag +- `include/xrpl/nodestore/detail/codec.h` — LZ4 + inner-node compression; BufferFactory pattern +- `include/xrpl/nodestore/detail/varint.h` — base-127 varint; function templates for ODR safety ### Source (libxrpl/nodestore/) -- `src/libxrpl/nodestore/Database.cpp` — read pool, hash coalescing, shutdown -- `src/libxrpl/nodestore/DatabaseNodeImp.cpp` — simple backend wrapper +- `src/libxrpl/nodestore/Database.cpp` — read pool, hash coalescing, shutdown, `importInternal()` +- `src/libxrpl/nodestore/DatabaseNodeImp.cpp` — simple backend wrapper; `fetchBatch` resize guard - `src/libxrpl/nodestore/DatabaseRotatingImp.cpp` — rotation, promotion, snapshot locking -- `src/libxrpl/nodestore/BatchWriter.cpp` — double-buffer swap + backpressure +- `src/libxrpl/nodestore/BatchWriter.cpp` — double-buffer swap, backpressure, load estimation - `src/libxrpl/nodestore/ManagerImp.cpp` — singleton init + factory registration -- `src/libxrpl/nodestore/backend/NuDBFactory.cpp` — production default -- `src/libxrpl/nodestore/backend/RocksDBFactory.cpp` — alternate production backend -- `src/libxrpl/nodestore/backend/MemoryFactory.cpp` — test/ephemeral -- `src/libxrpl/nodestore/backend/NullFactory.cpp` — `type=none` +- `src/libxrpl/nodestore/DecodedBlob.cpp` — format parsing; legacy 8-byte prefix handling +- `src/libxrpl/nodestore/DummyScheduler.cpp` — synchronous no-op scheduler +- `src/libxrpl/nodestore/NodeObject.cpp` — `PrivateAccess` factory; `CountedObject` wiring +- `src/libxrpl/nodestore/backend/NuDBFactory.cpp` — production default; compression pipeline +- `src/libxrpl/nodestore/backend/RocksDBFactory.cpp` — alternate production; `RocksDBEnv` thread naming +- `src/libxrpl/nodestore/backend/MemoryFactory.cpp` — test/ephemeral; shared `MemoryDB` by path +- `src/libxrpl/nodestore/backend/NullFactory.cpp` — `type=none`; reference `Backend` skeleton ### Lifecycle orchestration -- `src/xrpld/app/misc/SHAMapStoreImp.cpp` — drives `rotate()`, manages state DB +- `src/xrpld/app/misc/SHAMapStoreImp.cpp` — drives `rotate()`, manages state DB, enforces min rotation interval (256 ledgers networked / 8 standalone) diff --git a/docs/skills/peering.md b/docs/skills/peering.md index 4d2d9c270d..436dcd9411 100644 --- a/docs/skills/peering.md +++ b/docs/skills/peering.md @@ -4,17 +4,21 @@ P2P network using persistent TCP/IP connections. Messages serialized via Protoco ## Key Invariants -- Connection preference order: Fixed Peers -> Livecache -> Bootcache -- Cluster connections and reserved (PeerReservationTable) connections do NOT count toward slot limits in `Counts::can_activate` — they bypass `m_in_active`/`m_out_active` caps -- Validators are forced `peerPrivate=true` by `Config::makeConfig` even without explicit `[peer_private]`; this is "soft" privacy (still accepts inbound, but asks peers not to gossip address) +- Connection preference order: Fixed Peers → Livecache → Bootcache +- Cluster connections and reserved (`PeerReservationTable`) connections do NOT count toward slot limits in `Counts::can_activate` — they bypass `m_in_active`/`m_out_active` caps +- Validators are forced `peerPrivate=true` by `Config::makeConfig` even without explicit `[peer_private]`; this is "soft" privacy (still accepts inbound, but asks peers not to gossip address). `wantIncoming` is derived *before* the validator key check fires, so a validator with a key still advertises inbound willingness internally. - Protobuf message changes MUST maintain wire compatibility or risk network partitioning -- Squelching: after `MAX_SELECTED_PEERS=5` peers each cross `MAX_MESSAGE_THRESHOLD=20` messages, a random 5-peer subset becomes "Selected"; rest are muted via `TMSquelch` for a randomized window in `[MIN_UNSQUELCH_EXPIRE=300s, ...MAX_UNSQUELCH_EXPIRE_PEERS=3600s]` +- Squelching: after `MAX_SELECTED_PEERS=5` peers each cross `MAX_MESSAGE_THRESHOLD=20` messages, a random 5-peer subset becomes "Selected"; rest are muted via `TMSquelch` for a randomized window in `[MIN_UNSQUELCH_EXPIRE=300s, MAX_UNSQUELCH_EXPIRE_PEERS=3600s]` - Reduce-relay does not activate for `WAIT_ON_BOOTUP=10min` after process start (`Slots::reduceRelayReady`) - Handshake binds TLS session to node identity via signature of `makeSharedValue` (SHA-512 XOR of TLS finished messages, then `sha512Half`); a zero shared value (degenerate XOR) is rejected - Wire format: 6-byte header uncompressed, 10-byte compressed; 26-bit payload size field caps messages at `maximumMessageSize = 64 MiB` - Hop count cap: `Endpoint` constructor clamps `hops` to `maxHops+1=7`; `Logic::preprocess` drops `hops > maxHops=6` and increments surviving hops by 1 before storage - TX reduce-relay queue is bounded by `MAX_TX_QUEUE_SIZE=10000` hashes per peer; required to stay under the 64 MiB protocol limit at high TPS - `peersWithMessage_` (in `Slots`) is `inline static` — shared across all instantiations, not per-instance +- `Bootcache` valence is a *streak* counter: clamped to 0 before crossing sign, so a failing peer resets to 0 before going positive. Static peers receive `staticValence=32`. +- `Livecache` uses `push_front` insertion — MUST `shuffle()` before handout to prevent topology manipulation by an adversary repeatedly advertising its own address +- `SourceStrings::fetch()` silently drops malformed addresses — no error returned for bad config entries +- `Checker` destructor calls `wait()` only; must call `stop()` first explicitly before destruction to cancel pending probes ## Common Bug Patterns @@ -22,24 +26,29 @@ P2P network using persistent TCP/IP connections. Messages serialized via Protoco - `HashRouter::shouldRelay` prevents duplicate relay; bypassing it causes message storms (`OverlayImpl::relay` enforces this) - `ConnectAttempt::processResponse` on HTTP 503 parses `peer-ips` JSON array for redirect; malformed entries are validated as endpoints before being passed to `peerFinder().onRedirects` - `PeerImp::close` must run on the strand; calling from wrong thread causes race conditions on socket and timer state -- Destructor chain: `~PeerImp` -> `deletePeer` -> `onPeerDeactivate` -> `on_closed` -> `remove`; interrupting this chain leaks slots +- Destructor chain: `~PeerImp` → `deletePeer` → `onPeerDeactivate` → `on_closed` → `remove`; interrupting this chain leaks slots - `~ConnectAttempt` releases the PeerFinder slot via `on_closed(slot_)` only if `slot_ != nullptr`; on successful promotion to `PeerImp`, `slot_` is moved out and must be left null - `tryAsyncShutdown()` must defer SSL shutdown until `!readPending_ && !writePending_`; calling `async_shutdown` while async I/O is in flight is undefined behavior - `dynamic_pointer_cast` is required wherever `Manager` API takes `shared_ptr` but `Logic` needs `SlotImp` - A compressed message from a peer that did NOT negotiate compression is a hard `protocol_error` in `invokeProtocolMessage` (prevents CPU forcing attack) - Self-squelch attempt (peer sends `TMSquelch` for our own validation key) is silently dropped in `PeerImp::onMessage(TMSquelch)` — never trust a peer to silence us - `Cluster::for_each` callback must NOT call `Cluster::update` — same non-recursive mutex, deadlock +- `ZeroCopyOutputStream` destructor MUST flush trailing `commit_` — protobuf doesn't guarantee terminal `BackUp` or `Next` call; missing flush silently drops bytes +- `Bootcache` erase-then-reinsert pattern in `on_success`/`on_failure`: bimap values are logically const after insert, so valence updates require erase + reinsert +- `SlotImp::state(active)` is forbidden — must use `activate()` which also sets `whenAcceptEndpoints`; bypassing this leaves the flood-control timestamp unset +- `SourceStrings::fetch()` has an idempotent retry loop quirk: if `from_string()` fails, it retries the same string (no-op); effective behavior is just "drop invalid entries" ## Connection Lifecycle ### Outbound (`ConnectAttempt`) -1. `OverlayImpl::connect` -> resource check -> `peerFinder().new_outbound_slot()` -> create `ConnectAttempt` -2. Five-phase chain: `async_connect` -> TLS `async_handshake` -> HTTP write -> HTTP read -> `processResponse` -3. Dual-timer scheme: global 25s ceiling (`connectTimeout`) + per-step timers (8/8/3/3/2s); both share `onTimer` callback distinguishing by expiry comparison +1. `OverlayImpl::connect` → resource check → `peerFinder().new_outbound_slot()` → create `ConnectAttempt` +2. Five-phase chain: `async_connect` → TLS `async_handshake` → HTTP write → HTTP read → `processResponse` +3. Dual-timer scheme: global 25s ceiling (`connectTimeout`) + per-step timers (8/8/3/3/2s); both share `onTimer` callback distinguishing by expiry comparison. Global timer armed once (guarded by epoch-check), step timer reset at each phase. 4. `ioPending_` flag prevents starting SSL shutdown while another async op is pending on the stream -5. On HTTP 101: `verifyHandshake` -> create `PeerImp` -> move `slot_` and `stream_ptr_` into peer -> `overlay_.add_active(peer)` +5. On HTTP 101: `verifyHandshake` → create `PeerImp` → move `slot_` and `stream_ptr_` into peer → `overlay_.add_active(peer)` 6. On HTTP 503 with JSON `peer-ips`: forward to `peerFinder().onRedirects` +7. `verify_none` on TLS — security comes from node-key signature over `makeSharedValue`, not cert chain ### Inbound (`OverlayImpl::onHandoff`) @@ -62,6 +71,8 @@ P2P network using persistent TCP/IP connections. Messages serialized via Protoco - Header parsing changes (`ProtocolMessage.h`): the high-bit format guard (`*iter & 0x80`) and reserved-bit checks (`*iter & 0x0C == 0`) MUST remain - Adding a new compression `Algorithm` enum value: must have high bit set, low nibble zero (so it's extractable via `*iter & 0xF0`); update `Compression.h` dispatch switches or the `UNREACHABLE` guard fires - Strand discipline: any new method touching socket/queue state must guard with `if (!strand_.running_in_this_thread()) return post(strand_, ...)` +- `ZeroCopyOutputStream` use: always ensure the object goes out of scope (destructor flush) before the caller reads from the streambuf +- Bootcache changes: remember valence updates require erase-then-reinsert (bimap), and the cooldown (60s) batches writes ## Key Patterns @@ -89,6 +100,7 @@ std::call_once(once_flag_, &Message::compress, this); // Eligible types only (mtTRANSACTION, mtLEDGER_DATA, mtVALIDATOR_LIST, ...); // Latency-sensitive types (mtPING, mtVALIDATION, mtPROPOSE_LEDGER) excluded. // Falls back to uncompressed if savings < 4 bytes (compressed header overhead). +// Messages <= 70 bytes are never compressed. ``` ### Resource Charging Batches @@ -103,6 +115,13 @@ std::call_once(once_flag_, &Message::compress, this); // callers (ConnectAttempt, PeerImp::doAccept) wrap in try/catch and tear down. ``` +### Traffic Categorization Double-Call +```cpp +// categorize() called once at Message construction (outbound, inbound=false). +// addCount() called twice per message: once for category, once for 'total'. +// 'unknown' is NOT rolled into 'total'. +``` + ## Reduce-Relay (Squelch) Architecture Two halves, decoupled: @@ -114,6 +133,14 @@ All `OverlayImpl::updateSlotAndSquelch` calls are dispatched to `strand_` becaus Squelch expiry is lazy: no background timer. `expireSquelch` removes stale entries on next send. Out-of-bounds durations in incoming `TMSquelch` trigger `feeInvalidData` and `removeSquelch` (defensive clear). +### Slot Selection Algorithm (`Slot::update`) + +Two-threshold design: peers enter the *considered pool* at `MIN_MESSAGE_THRESHOLD=19` messages; selection fires when `MAX_SELECTED_PEERS=5` peers individually reach `MAX_MESSAGE_THRESHOLD=20`. The one-message gap lets the system confirm a peer has continued sending before committing it as a candidate. If fewer than 5 non-idle peers are available at selection time, `initCounting()` resets and defers — never squelches with incomplete picture. + +Inactivity (`IDLED=8s`): idle selected peer → unsquelch all + revert to `Counting`. Slots whose `lastSelected_` is older than `MAX_UNSQUELCH_EXPIRE_DEFAULT=600s` are deleted by `deleteIdlePeers()`. + +Squelch duration scaled by peer count: `min(max(600s, 10s × npeers), 3600s)`. + ## TX Reduce-Relay When `txReduceRelayEnabled_` (negotiated via `FEATURE_TXRR`): @@ -131,7 +158,7 @@ When `txReduceRelayEnabled_` (negotiated via `FEATURE_TXRR`): Hysteresis (24 vs 128) prevents oscillation on slightly-behind peers. -## Send Queue Backpressure (Tuning.h) +## Send Queue Backpressure (`Tuning.h`) Three tiers: - `targetSendQueue=128` — below this, peer is healthy; resets `large_sendq_` counter @@ -151,14 +178,14 @@ Implements peer address discovery, slot accounting, and reachability checks. Own ### Components -- **`Logic`**: central decision engine. Holds `slots_`, `connectedAddresses_` (multiset for IP limit), `keys_` (dedup public keys), `fixed_`, `livecache_`, `bootcache_`. Guarded by `std::recursive_mutex lock_`. -- **`Livecache`**: ~30s TTL gossip cache (`Tuning::liveCacheSecondsToLive`). `beast::aged_map` + `boost::intrusive::list` per hop bucket (size `maxHops+2=8`). MUST `shuffle()` before handout — `push_front` insertion is exploitable otherwise. -- **`Bootcache`**: persistent (SQLite via `StoreSqdb`). Bimap (`unordered_set_of` by endpoint, `multiset_of` by valence) for O(1) update and ranked iteration. Valence is a streak counter (clamped to 0 before crossing sign). `staticValence=32` for `[ips]`/`[ips_fixed]`. Throttled writes: 60s cooldown via `flagForUpdate`/`checkUpdate`; destructor force-flushes. +- **`Logic`**: central decision engine. Holds `slots_`, `connectedAddresses_` (multiset for IP limit), `keys_` (dedup public keys), `fixed_`, `livecache_`, `bootcache_`. Guarded by `std::recursive_mutex lock_`. Recursive mutex needed because `on_closed()` calls `remove()` independently. +- **`Livecache`**: ~30s TTL gossip cache (`Tuning::liveCacheSecondsToLive`). `beast::aged_map` + `boost::intrusive::list` per hop bucket (size `maxHops+2=9`, indices 0–8). MUST `shuffle()` before handout — `push_front` insertion is exploitable otherwise. +- **`Bootcache`**: persistent (SQLite via `StoreSqdb`). Bimap (`unordered_set_of` by endpoint, `multiset_of` by valence) for O(1) update and ranked iteration. Valence is a streak counter (clamped to 0 before crossing sign). `staticValence=32` for `[ips]`/`[ips_fixed]`. Throttled writes: 60s cooldown via `flagForUpdate`/`checkUpdate`; destructor force-flushes. Pruning: remove bottom 10% when over 1000 entries. - **`Checker`**: async TCP probe for verifying peer's advertised listening port. Self-managing `async_op` via `shared_ptr` capture in handler; `~Checker` calls `wait()` (not `stop()` — must call `stop()` first explicitly). -- **`Counts`**: pure bookkeeping, no own mutex (relies on `Logic::lock_`). All updates funnel through private `adjust(slot, ±1)`. Fixed/reserved bypass active caps in `can_activate`. -- **`SlotImp`**: concrete slot state. Two constructors (inbound takes both endpoints; outbound only remote, sets `checked=true,canAccept=true` since TCP connect itself proves reachability). State machine enforced by `XRPL_ASSERT` in `state()` and `activate()`. `m_listening_port` is `std::atomic` with `-1` sentinel. +- **`Counts`**: pure bookkeeping, no own mutex (relies on `Logic::lock_`). All updates funnel through private `adjust(slot, ±1)`. Fixed/reserved bypass active caps in `can_activate`. `isConnectedToNetwork()` returns `true` only when `m_out_max == 0` (pure listener mode). +- **`SlotImp`**: concrete slot state. Two constructors: inbound takes both endpoints, sets `checked=false,canAccept=false`; outbound only takes remote, sets `checked=true,canAccept=true` since TCP connect itself proves reachability. State machine enforced by `XRPL_ASSERT` in `state()` and `activate()`. `m_listening_port` is `std::atomic` with `-1` sentinel. - **`Fixed`**: per-fixed-peer backoff. Fibonacci sequence in minutes: `{1,1,2,3,5,8,13,21,34,55}`, clamped to last index. `failure()` advances; `success()` resets. -- **`Source`**: abstract; only concrete is `SourceStrings` (config `[ips]`). +- **`Source`**: abstract; only concrete is `SourceStrings` (config `[ips]`). `cancel()` is a no-op for all current (synchronous) implementations but exists as an extension point for future async sources. ### Autoconnect Tier Order @@ -176,6 +203,10 @@ Implements peer address discovery, slot accounting, and reachability checks. Own - **Sending (`buildEndpointsForPeers`)**: shuffle slots, use `SlotHandouts` per peer, run `handout()` algorithm. Self-advertisement uses zero-address IPv6 sentinel — receiver substitutes socket's remote address. - **`Handouts` algorithm**: round-robin across multiple targets to ensure fair distribution. `move_back` after each acceptance rotates endpoints. Per-target dedup via `SlotImp::recent_t` (aged map; `filter()` uses `<=` hop comparison; `try_insert` writes both received and sent into recent — pessimistic update). +### `recent_t` Filter Semantics + +`insert()` updates cached hop count only if new value ≤ existing. `filter()` suppresses sends when cached hop ≤ sending hop. The `<=` boundary is intentional — sending at a strictly lower hop than the peer knows is still useful; matching or higher is redundant. + ## TLS Channel-Binding (Non-Standard) `makeSharedValue` derives a 256-bit value from TLS finished messages: @@ -193,25 +224,25 @@ Built by `buildHandshake`, verified by `verifyHandshake`. Verify order is layere 4. `Session-Signature` cryptographic verify 5. `Local-IP`/`Remote-IP` cross-check (NAT diagnostics) -Feature negotiation via `X-Protocol-Ctl`: `compr=lz4`, `vprr=1`, `txrr=1`, `ledgerreplay=1`. Responder echoes back only features locally configured AND requested (AND-gate). +Feature negotiation via `X-Protocol-Ctl`: `compr=lz4`, `vprr=1`, `txrr=1`, `ledgerreplay=1`. Responder echoes back only features locally configured AND requested (AND-gate). Initiator unconditionally advertises all locally supported features. ## ZeroCopy I/O Adapters -`ZeroCopyInputStream` wraps `ConstBufferSequence` for protobuf parsing without intermediate copy. `BackUp`/`Skip` support sub-buffer granularity via tracked `pos_` within current buffer. +`ZeroCopyInputStream` wraps `ConstBufferSequence` for protobuf parsing without intermediate copy. `BackUp`/`Skip` support sub-buffer granularity via tracked `pos_` within current buffer. Empty buffer sequence is safe (null `pos_` initialized in constructor). -`ZeroCopyOutputStream` uses deferred commit pattern: `commit_` tracks bytes promised but not yet committed. Destructor MUST flush trailing `commit_` — protobuf doesn't guarantee terminal `BackUp` or `Next` call. +`ZeroCopyOutputStream` uses deferred commit pattern: `commit_` tracks bytes promised but not yet committed. Destructor MUST flush trailing `commit_` — protobuf doesn't guarantee terminal `BackUp` or `Next` call. `BackUp(n)` asserts `n <= commit_` and prevents double-commit. ## Traffic Categorization -`TrafficCount::categorize()` is called once at `Message` construction (outbound) and per inbound message. Two-stage: static `unordered_map` for simple types, then `dynamic_cast` for protobuf inspection of `TMLedgerData`/`TMGetLedger` (`requestcookie` distinguishes forwarded vs originated) and `TMGetObjectByHash` (`query()` flag determines get/share). `unknown` is NOT rolled into `total`. +`TrafficCount::categorize()` is called once at `Message` construction (outbound) and per inbound message. Two-stage: static `unordered_map` for simple types, then `dynamic_cast` for protobuf inspection of `TMLedgerData`/`TMGetLedger` (`requestcookie` distinguishes forwarded vs originated) and `TMGetObjectByHash` (`query()` flag determines get/share). `unknown` is NOT rolled into `total`. `squelch_suppressed` records bytes NOT transmitted due to squelch; `squelch_ignored` records bytes from peers ignoring squelch. -## Compression Eligibility (Message::compress) +## Compression Eligibility (`Message::compress`) -Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, `mtTRANSACTION`, `mtGET_LEDGER`, `mtLEDGER_DATA`, `mtGET_OBJECTS`, `mtVALIDATOR_LIST`, `mtVALIDATOR_LIST_COLLECTION`, `mtREPLAY_DELTA_RESPONSE`, `mtTRANSACTIONS`. Excludes high-frequency control messages (`mtPING`, `mtVALIDATION`, `mtPROPOSE_LEDGER`, `mtSTATUS_CHANGE`). If compressed size doesn't beat uncompressed minus 4-byte header overhead, fall back to uncompressed. +Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, `mtTRANSACTION`, `mtGET_LEDGER`, `mtLEDGER_DATA`, `mtGET_OBJECTS`, `mtVALIDATOR_LIST`, `mtVALIDATOR_LIST_COLLECTION`, `mtREPLAY_DELTA_RESPONSE`, `mtTRANSACTIONS`. Excludes high-frequency control messages (`mtPING`, `mtVALIDATION`, `mtPROPOSE_LEDGER`, `mtSTATUS_CHANGE`). If compressed size doesn't beat uncompressed minus 4-byte header overhead, fall back to uncompressed (`bufferCompressed_` cleared, `getBuffer()` returns uncompressed). ## HTTP Endpoints (served by `OverlayImpl::processRequest`) -- `/crawl` — JSON topology, gated by bitmask config +- `/crawl` — JSON topology, gated by bitmask config (`CrawlOptions::Overlay|ServerInfo|ServerCounts|Unl`) - `/health` — three-tier status (200/503/500) — HTTP status encodes result so LBs need no JSON parsing - `/vl/` or `/vl//` — signed validator list @@ -223,6 +254,17 @@ Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, - Strand vs mutex: peer registry mutations use `mutex_`; timer/squelch/tx-metrics work uses strand - `OverlayImpl::Child` registration: destructor auto-removes from `list_`; `stopChildren()` copies pointers before iterating to avoid invalidation - `PeerImp` field locks: `recentLock_` (ledger state, latency), `nameMutex_` (`shared_mutex` for `name_`); strand-confined fields need no lock +- `TxMetrics` has its own `std::mutex`; writers additionally serialize via overlay strand; RPC readers call `json()` directly without going through strand +- `Cluster::mutex_` is non-recursive — `for_each` callback must not call `update()` +- `PeerReservationTable`: `list()` deliberately releases lock before `std::sort` to minimize hold time (snapshot sort pattern) + +## Cluster Registry + +`Cluster` owns a `std::set`. The `Comparator` is `is_transparent` — enables `find(PublicKey)` without constructing a dummy `ClusterNode`. `update()` enforces monotonic time (rejects stale reports), preserves names across nameless gossip updates, and uses erase+`emplace_hint` for O(1) amortized reinsert. `load()` is fail-fast on malformed lines (returns `false`) but tolerates duplicates with a warning (first entry wins). + +## PeerSet (Data Acquisition) + +`PeerSet` / `PeerSetImpl` manages the working set of peers queried for a single in-flight data acquisition (ledger, tx-set, etc.). Uses scored peer selection (`Peer::getScore(hasItem)`) sorted descending. `peers_` set of `Peer::id_t` acts as exclusion list — same peer is never re-added across retries. `DummyPeerSet` via `make_DummyPeerSet()` is the null-object used when `loadOldLedger()` needs `InboundLedger` without live peers. ## Key Files @@ -236,6 +278,8 @@ Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, - `src/xrpld/overlay/detail/ProtocolVersion.{h,cpp}` — `XRPL/x.y` negotiation - `src/xrpld/overlay/detail/ZeroCopyStream.h` — protobuf/Asio buffer adapters - `src/xrpld/overlay/detail/Tuning.h` — all overlay magic numbers +- `src/xrpld/overlay/make_Overlay.h` — factory + `setup_Overlay` (parses `[overlay]`, `[crawl]`, `[vl]`, `[network_id]`) +- `src/xrpld/overlay/predicates.h` — composable peer-selection/dispatch functors for `Overlay::foreach` ### Reduce-relay - `src/xrpld/overlay/Slot.h` — per-validator state machine + selection algorithm @@ -255,6 +299,9 @@ Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, ### Reservations - `src/xrpld/overlay/detail/PeerReservationTable.cpp` — persistent allowlist via SQLite +### Compression +- `src/xrpld/overlay/Compression.h` — `Algorithm` enum, dispatch wrappers, wire-format constants + ### PeerFinder - `src/xrpld/peerfinder/PeerfinderManager.h` / `Slot.h` / `make_Manager.h` — public interface - `src/xrpld/peerfinder/detail/Logic.h` — central decision engine @@ -264,5 +311,7 @@ Skip if ≤70 bytes. Whitelist of eligible types: `mtMANIFESTS`, `mtENDPOINTS`, - `src/xrpld/peerfinder/detail/Counts.h` — slot bookkeeping - `src/xrpld/peerfinder/detail/Fixed.h` — Fibonacci backoff - `src/xrpld/peerfinder/detail/Handouts.h` — fair distribution algorithm -- `src/xrpld/peerfinder/detail/StoreSqdb.h` — SQLite persistence +- `src/xrpld/peerfinder/detail/StoreSqdb.h` — SQLite persistence (schema v4, migration handles `DROP COLUMN` via table rename) +- `src/xrpld/peerfinder/detail/Source.h` / `SourceStrings.{h,cpp}` — abstract + static-string address source - `src/xrpld/peerfinder/detail/Tuning.h` — peerfinder magic numbers +- `src/xrpld/peerfinder/detail/iosformat.h` — `leftw` stream manipulator for log alignment diff --git a/docs/skills/protocol.md b/docs/skills/protocol.md index 5deb9a2f72..bdee12c8b8 100644 --- a/docs/skills/protocol.md +++ b/docs/skills/protocol.md @@ -17,6 +17,7 @@ Amount types (lean, runtime polymorphic via Asset): STAmount = unified wire/serialized form holding Asset + value - holds(), holds(), native(), integral() - canonicalize() normalizes (mantissa, exponent) per asset rules + - Internal: mAsset + mValue(uint64) + mOffset(int) + mIsNegative(bool) ``` `PathAsset` = `std::variant` — pathfinding-only asset reference (no issuer); used inside `STPathElement`. @@ -31,15 +32,18 @@ Conversion utilities in `AmountConversions.h`: `toSTAmount`, `toAmount`, `get - **Field ID encoding:** 1–3 bytes; both type and field codes <16 → single byte `(type<<4)|name` - **Hash domain separation:** every signable payload prepends a 4-byte `HashPrefix` (`STX\0`, `SMT\0`, `VAL\0`, `BCH\0`, `CLM\0`, `LWR\0`, `MAN`, `TXN`, etc.) — never share hashes across domains. Helper `make_hash_prefix(a,b,c)` is constexpr `uint32_t` builder. - **STObject access semantics:** `obj[sfFoo]` throws `FieldErr` if absent; `obj[~sfFoo]` returns `std::optional`. `getOrThrow(name)` family in `json_get_or_throw.h` enforces presence + type for raw JSON inputs. -- **Amendment IDs are deterministic:** `featureFoo == sha512Half(Slice("Foo"))` — never change a feature name. Feature names must satisfy `isFeatureName()` at compile time (`UpperCamel` regex). +- **Amendment IDs are deterministic:** `featureFoo == sha512Half(Slice("Foo"))` — never change a feature name. Feature names must satisfy `isFeatureName()` at compile time (`UpperCamel` regex). Names exactly 32 bytes long are forbidden (reserved for raw hash collision prevention). - **`numFeatures` is a ceiling, NOT an exact count.** Counting includes `XRPL_RETIRE_*` and any inactive macros; never use it as a length. -- **Feature registry frozen at startup:** `registerFeature` checks `numFeatures` and aborts via static-init `LogicError` if exceeded. +- **Feature registry frozen at startup:** `registerFeature` checks `numFeatures` and aborts via static-init `LogicError` if exceeded. The `readOnly` atomic fence flips after all file-scope variables are initialized — any query before then asserts. - **Singletons everywhere:** `SField`, `LedgerFormats`, `TxFormats`, `InnerObjectFormats`, `Permission`, `Feature` registry all use Meyer's singletons; registration completes before `main()` via static init. -- **Multi-sign signers MUST be sorted ascending by AccountID** (no duplicates, count in [1,32], cannot include tx account). +- **Multi-sign signers MUST be sorted ascending by AccountID** (no duplicates, count in [1,32], cannot include tx account). The signer's AccountID is appended to the multi-sign blob to prevent shared-RegularKey replay attacks. - **`vfFullyCanonicalSig` always set** by signer; verifiers normalize ECDSA S to low form via libsecp256k1. - **Amendment-gated arithmetic:** `getSTNumberSwitchover()` is a `LocalValue` (per-coroutine) selecting legacy vs `Number`-based normalization in `IOUAmount`/`STAmount`. -- **TxMeta `AffectedNodes` must be sorted by index** for canonical serialization (consensus-critical). +- **TxMeta `AffectedNodes` must be sorted by index** for canonical serialization (consensus-critical). `addRaw()` performs this sort; failure is a consensus fork risk. - **STObject debug-only field-uniqueness checks** (`isFieldAllowed`): silent duplicate fields in production are possible bugs but no runtime check. +- **STLedgerEntry construction fails loudly** if the type is unrecognized — no silent fallback. +- **STValidation only accepts secp256k1 keys**; Ed25519 keys throw at construction time. +- **STNumber two-phase rounding contract:** `associateAsset()` must be called before `add()`. The assertion in `add()` checks idempotency — calling `setValue()` after `associateAsset()` without re-associating is a programming error. ## Macro-Driven Registries (X-Macros) @@ -57,12 +61,14 @@ Pattern uses `#pragma push_macro/pop_macro` to protect macro names. `UNWRAP(...) Feature name validation is `constexpr` (compile-time `static_assert` on the literal); typos like lower-case first letter fail to build. +The `FeatureCollections` internal singleton uses `boost::multi_index_container` with three simultaneous indexes: random-access by insertion order (`byIndex` for bitset mapping), hash-unique by `uint256`, and hash-unique by name. A simple `unordered_map` cannot provide the stable integer index that `FeatureBitset` requires. + ## Field Identity (`SField`) - **Field code** = `(SerializedTypeID << 16) | fieldValue` — packs type family and per-type index; canonical sort key - `SField` instances are immutable singletons created at static init via `private_access_tag_t` (only definable inside `SField.cpp`) - `TypedField` adds compile-time payload type; `OptionaledField` via `operator~(sfField)` -- Metadata flags (`fieldMeta`): `sMD_ChangeOrig`, `sMD_ChangeNew`, `sMD_DeleteFinal`, `sMD_Create`, `sMD_Always`, `sMD_BaseTen` (decimal display), `sMD_PseudoAccount`, `sMD_NeedsAsset` (drives `STTakesAsset` association) +- Metadata flags (`fieldMeta`): `sMD_ChangeOrig`, `sMD_ChangeNew`, `sMD_DeleteFinal`, `sMD_Create`, `sMD_Always`, `sMD_BaseTen` (decimal display), `sMD_PseudoAccount`, `sMD_NeedsAsset` (drives `STTakesAsset` association), `sMD_Default` (field absent when zero) - `IsSigning::no` excludes fields from signing hash (`sfTxnSignature`, `sfSigners`, `sfSignature`, etc.) - `isBinary()` ⇔ `fieldValue<256` (wire-representable); `isDiscardable()` ⇔ `fieldValue>256` (JSON-only, e.g., `sfHash`, `sfIndex`) - **Debug-only uniqueness check** during static init; release builds will silently mis-register on collision @@ -72,7 +78,7 @@ Feature name validation is `constexpr` (compile-time `static_assert` on the lite | Item | Encoding | |---|---| | XRP STAmount | 8 bytes; bit63=0, bit62=sign(1=pos), 62-bit value | -| MPT STAmount | 8 bytes header (bit63=0, bit61=1) + 192-bit MPTID | +| MPT STAmount | 8 bytes header (bit63=0, bit61=1, 56-bit value) + 192-bit MPTID | | IOU STAmount | bit63=1, bit62=sign, 8-bit (offset+97), 54-bit mantissa, +20B currency, +20B issuer | | AccountID | 20 bytes, VL-prefixed when standalone (`STAccount` mimics `STBlob` wire format) | | MPTID | 192 bits = 32-bit big-endian sequence ‖ 160-bit issuer | @@ -80,12 +86,13 @@ Feature name validation is `constexpr` (compile-time `static_assert` on the lite | STObject | fields in canonical order; ends with `STI_OBJECT,1` (`0xe1`) | | VL prefix | 1 byte (0–192), 2 bytes (193–12480), 3 bytes (12481–918744); else `std::overflow_error` | | STIssue | 160-bit currency; if zero → XRP; if next 160 = `noAccount()` → MPT (then 32-bit seq); else IOU issuer | +| STNumber | 12 bytes: int64 signed mantissa + int32 exponent (two separate statements — order must be explicit) | | LP token currency | byte0 = `0x03`; bytes 1-19 = `sha512Half(min(asset1,asset2), max(asset1,asset2))` low bits | | Order book quality | `(exponent+100) << 56 \| mantissa`; embedded in last 8 bytes of directory key (big-endian) so SHAMap order = price order | | NFTokenID (256-bit) | flags(2) + transferFee(2) + issuer(20) + cipheredTaxon(4) + serial(4); low 96 bits = page sort key | | LedgerHeader | 118 bytes fixed layout (seq, drops, parentHash, txHash, accountHash, parentClose/closeTime/closeFlags, closeTimeResolution) | | Payment channel claim | `HashPrefix::paymentChannelClaim` ‖ channelID(32) ‖ amount(8) | -| Batch signing payload | `HashPrefix::batch` ‖ inner-tx hash list | +| Batch signing payload | `HashPrefix::batch` ‖ outer flags(4) ‖ inner-tx count(4) ‖ inner-tx hash list | ## Canonical Hashes @@ -107,10 +114,12 @@ All hashes use `sha512Half` (first 256 bits of SHA-512). `HashPrefix` constants - `STVar` is movable; moving an on-heap STVar steals the pointer, while inline ones must invoke each ST type's move ctor through the v-table - `copy()`/`move()` virtuals on every ST type delegate to `STBase::emplace()` for placement-new into `STVar`'s buffer - **Two modes:** - - **Free** (`mType==nullptr`): linear field scan, accepts any field - - **Templated** (`mType` set): O(1) field lookup via `SOTemplate::indices_`, template enforced + - **Free** (`mType==nullptr`): linear field scan via `getFieldIndex()`; accepts any field; insertion order preserved + - **Templated** (`mType` set): O(1) field lookup via `SOTemplate::indices_`; `v_` laid out in template order with every slot pre-populated; unknown fields rejected - `applyTemplate()` validates after deserialization; `set(SOTemplate)` initializes empty object with template - Deserialization depth capped at 10 to prevent stack exhaustion +- `operator==` compares only `isBinary()==true` fields (O(n²) by design); `isEquivalent()` fast-paths when same `mType` pointer (positional comparison) +- `makeInnerObject()` applies templates conditionally on `fixInnerObjTemplate` / `fixInnerObjTemplate2` amendments — historical ledger entries without template structure must not be rejected on replay ### Proxy Access Pattern @@ -148,6 +157,11 @@ Proxies forbid removing `soeREQUIRED` or `soeDEFAULT` fields. - Comparing `Issue` instances when one side is MPT-wrapped → `Issue::operator==` only compares currency+account; use `Asset` equality - Relying on debug-only `assert` inside `STObject::isFieldAllowed` to catch duplicate fields in release - Treating `numFeatures` as a length / iteration bound (it includes retired slots) +- Calling `setValue()` on an `STNumber` field after `associateAsset()` without re-associating → idempotency assertion fires in `add()` +- Using Ed25519 key with `STValidation` → throws at construction time (only secp256k1 allowed) +- Batch inner transaction `sfRawTransactions` array exceeds `maxBatchTxCount` (8) or contains nested `ttBATCH` → rejected by `passesLocalChecks()` +- `getNFTokenIDFromPage()` without the page-split guard: a `sfModifiedNode` for a third page may lack `sfNFTokens` in `sfPreviousFields`; skip silently +- `STNumber` JSON string parsing asserting `!getCurrentTransactionRules()` — string-format numbers not allowed inside active transaction processing ## Key Patterns @@ -162,6 +176,8 @@ XRPL_RETIRE_FEATURE(OldFeature) // code removed; remains registered for ledger Lifecycle: `Supported::no/DefaultNo` → `Supported::yes/DefaultNo` → (rare) `DefaultYes` for critical fixes. **Never** revert `Supported::yes` to `no` (would amendment-block existing nodes). +`setCurrentTransactionRules()` has a non-obvious side effect: it calls `Number::setMantissaScale()` to push the precision mode (`small` vs `large`) without requiring `Number` to query rules on every arithmetic call. + ### NotTEC vs TER ```cpp @@ -182,11 +198,11 @@ auto s = startMultiSigningData(obj); finishMultiSigningData(signerAccountID, s); // append signer ID to shared payload ``` -`addWithoutSigningFields()` excludes signature fields from the signed payload — this is what breaks the circularity. +`addWithoutSigningFields()` excludes signature fields from the signed payload. Both `sign()` and `verify()` share the same serialization path (serialize once, check/set the field), ensuring they cannot diverge. ### Batch Signing -`HashPrefix::batch` + inner-tx hash list is signed by all inner accounts; outer tx carries the assembled `sfRawTransactions` array. `passesLocalChecks()` rejects nested batches. +`serializeBatch()` (in `Batch.h`, inline) produces: `HashPrefix::batch ‖ outer flags(4) ‖ inner-tx count(4) ‖ inner-tx hash list`. Both `checkBatchSingleSign()` and `checkBatchMultiSign()` call this once; multi-sign appends the per-signer AccountID suffix via `finishMultiSigningData`. `passesLocalChecks()` rejects nested batches. ### STNumber + STTakesAsset @@ -196,7 +212,7 @@ finishMultiSigningData(signerAccountID, s); // append signer ID to shared paylo associateAsset(*sle, vaultAsset); // rounds all sMD_NeedsAsset fields, removes zero-defaults ``` -`STNumber` serializes a `Number` (signed mantissa+exponent) directly; rounding is asset-dependent and resolved by `associateAsset` walking fields flagged `sMD_NeedsAsset`. +`STNumber` serializes a `Number` (signed mantissa+exponent, 12 bytes); rounding is asset-dependent and resolved by `associateAsset` walking fields flagged `sMD_NeedsAsset`. Fields with `sMD_Default` are removed from the SLE after rounding if the value became zero. `associateAsset()` is offset-based (the only path that yields mutable `STBase&`). ### LP Token Currency Derivation @@ -209,12 +225,47 @@ Currency lpc = ammLPTCurrency(asset1, asset2); // canonical std::minmax Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in fields flagged `sMD_PseudoAccount` (`sfAMMID`, `sfVaultID`, `sfLoanBrokerID`). These identify a stateless account address derived from the owner ledger entry. +### NFT Token ID Recovery from Metadata + +`getNFTokenIDFromPage()` uses set-difference: collect all token IDs from `sfPreviousFields` and `sfFinalFields` across all metadata nodes; assert `finalIDs.size() == prevIDs.size() + 1`; use `std::mismatch` to find the inserted entry. Guard: when a mint causes a page split, the third page's `sfModifiedNode` may have `sfPreviousFields` without `sfNFTokens` — check presence before extracting. + +## STAmount Arithmetic Details + +- **IOU canonical range:** mantissa ∈ [10^15, 10^16), exponent ∈ [-96, +80]; zero = (mantissa=0, exponent=-100) +- **Two rounding modes:** `mulRound`/`divRound` (legacy, rounds up when fractional ≥ 0.1) vs `mulRoundStrict`/`divRoundStrict` (correct remainder tracking, propagates `NumberRoundModeGuard`) +- **Overflow guard in multiply:** if `min(a,b) > sqrt(cMaxNative)`, product overflows — checked before 128-bit intermediate +- **`canAdd`/`canSubtract`:** for IOU, uses round-trip relative error test with 10^-4 tolerance +- **`areComparable()`:** uses `std::visit` over `Asset` variant; incompatible asset types throw immediately +- Feature-gated: `featureSingleAssetVault` / `featureLendingProtocol` gate the `fromNumber()` path in `operator=(Number const&)` + +## QualityFunction (AMM Path Optimization) + +`q(out) = m * out + b` where `b` = reciprocal-rate intercept, `m` = AMM price-impact slope. + +- **`AMMTag`:** `m_ = -fee/poolIn`, `b_ = poolOut*fee/poolIn` — derived from single-path AMM swap formula +- **`CLOBLikeTag`:** `m_ = 0`, `b_ = 1/quality.rate()` — also used for multi-path AMM (fixed allocation = constant quality) +- **`combine()`:** `m_ += b_ * qf.m_; b_ *= qf.b_` — linear function composition; clears `quality_` cache when slope becomes nonzero +- **`outFromAvgQ()`:** solves `out = (1/rate - b_) / m_`; rounding mode `upward` to conservatively bound output; returns `nullopt` if `m_==0`, rate==0, or `out<=0` +- `saveNumberRoundMode` RAII guard scopes the upward-rounding to just this computation + +## AccountID Cache + +Direct-mapped cache in `AccountID.cpp`. Indexed by `hardened_hash<>` (xxHash + random seed = DoS-resistant). Lock sharding: single `atomic locks_` encodes 64 independent spinlocks via `packed_spinlock` (one per `index % 64`). Edge case: `encoding[0] != 0` guard distinguishes an uninitialized slot from a legitimate cache hit for the all-zero `xrpAccount()`. Cache is optional; `initAccountIdCache(0)` disables it entirely. + +## Cross-Chain Bridge Attestations + +Two parallel hierarchies: `Attestations::AttestationClaim` / `AttestationCreateAccount` (full, with signature — what witnesses submit) vs `XChainClaimAttestation` / `XChainCreateAccountAttestation` (ledger-stored, signature stripped). Conversion constructors project signing→storage in one step. + +`AttestationMatch` three-state enum: `match`, `matchExceptDst`, `nonDstMismatch`. `XChainAddClaimAttestation` requires `match`; `XChainClaim` (user-specified dst) accepts `matchExceptDst`. `sameEvent()` ignores signer identity fields; full `operator==` requires all fields. + +Max attestations per container: 256 (far above any real witness set; guards memory allocation). + ## Critical Files ### Foundations - `include/xrpl/protocol/SField.h`, `src/libxrpl/protocol/SField.cpp` — field registry, X-macro expansion, code packing -- `include/xrpl/protocol/Feature.h`, `src/libxrpl/protocol/Feature.cpp` — `numFeatures` (ceiling!), `FeatureBitset`, `registerFeature` with compile-time name validation -- `include/xrpl/protocol/Rules.h`, `src/libxrpl/protocol/Rules.cpp` — `Rules` snapshot of enabled amendments; `CurrentTransactionRules` is a `LocalValue` (per-coroutine); `isFeatureEnabled()` queries thread-local +- `include/xrpl/protocol/Feature.h`, `src/libxrpl/protocol/Feature.cpp` — `numFeatures` (ceiling!), `FeatureBitset`, `registerFeature` with compile-time name validation; `readOnly` atomic fence +- `include/xrpl/protocol/Rules.h`, `src/libxrpl/protocol/Rules.cpp` — `Rules` snapshot of enabled amendments; `CurrentTransactionRules` is a `LocalValue` (per-coroutine); `isFeatureEnabled()` queries thread-local; `setCurrentTransactionRules` pushes `Number` mantissa scale - `include/xrpl/protocol/HashPrefix.h` — protocol-immutable domain separators; `make_hash_prefix` constexpr packer ### Macro Tables (single sources of truth) @@ -238,9 +289,9 @@ Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in field - `Asset.h/cpp` — variant of Issue/MPTIssue; `visit()`, `equalTokens()`, `BadAsset` sentinel - `Issue.h/cpp`, `MPTIssue.h/cpp` — XRP/IOU and MPT identity; **note** `Issue::operator==` ignores MPT-ness — always go through `Asset` - `STAmount.h/cpp` — unified serialized amount; `canMul`/`canAdd`/`canSubtract` safety checks; `mulRound`/`mulRoundStrict` (legacy vs precise rounding); `roundToScale` -- `STNumber.h/cpp` — `Number`-typed field; pairs with `STTakesAsset` infrastructure +- `STNumber.h/cpp` — `Number`-typed field; pairs with `STTakesAsset` infrastructure; 12-byte wire: int64 mantissa + int32 exponent - `STIssue.h/cpp`, `STCurrency.h/cpp` — asset-only fields -- `STTakesAsset.h/cpp` — `associateAsset` walks `sMD_NeedsAsset` fields, rounds + strips zero-defaults +- `STTakesAsset.h/cpp` — `associateAsset` walks `sMD_NeedsAsset` fields, rounds + strips zero-defaults; include order: `STTakesAsset.h` before `STLedgerEntry.h` - `IOUAmount.h/cpp`, `XRPAmount.h`, `MPTAmount.h/cpp` — lean representations - `Rate2.h` — `Rate` newtype with `parityRate = 1_000_000_000`; transfer-rate math - `Units.h` — phantom-typed `Drops`/`FeeLevel` @@ -251,36 +302,37 @@ Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in field - `PublicKey.h/cpp` — 33-byte unified format (0xED prefix for Ed25519); `ECDSACanonicality` enum (canonical vs fullyCanonical); libsecp256k1 normalization - `SecretKey.h/cpp` — `secure_erase` in dtor; deleted `==`/`<<`; XRPL-specific secp256k1 derivation via `Generator` - `Seed.h/cpp` — 128-bit; `parseGenericSeed()` cascades hex→base58→RFC1751→passphrase, rejecting other key types first -- `secp256k1.h` — libsecp256k1 context singleton +- `detail/secp256k1.h` — libsecp256k1 context singleton via template-with-default-param trick (ODR-safe header-only); created with `SIGN|VERIFY` flags combined - `digest.h` — `sha512Half`, `sha512_half_hasher_s` (secure erase variant) -- `tokens.h/cpp` (+ `b58_utils.h`, `token_errors.h`) — Base58Check; fast path uses base 58^10 intermediate (10–15× speedup, gated on non-MSVC for `__int128`); `TokenType` enum is protocol-stable +- `tokens.h/cpp` (+ `b58_utils.h`, `token_errors.h`) — Base58Check; fast path uses base 58^10 intermediate (10–15× speedup via `unsigned __int128`, gated on non-MSVC); `TokenType` enum is protocol-stable; `alphabetReverse` is `constexpr` 256-element array; leading-zero bytes each map to `'r'` ### Wire I/O - `Serializer.h/cpp` — accumulator; `addVL`, `addFieldID`, big-endian integers, `getSHA512Half()` - `SerialIter` — non-owning forward cursor over a byte buffer; throws on underrun -- `Sign.h/cpp` — `sign`/`verify` with HashPrefix prepended to `addWithoutSigningFields()` output; `signingForID` helper for arbitrary payload bytes +- `Sign.h/cpp` — `sign`/`verify` with HashPrefix prepended to `addWithoutSigningFields()` output; `startMultiSigningData`/`finishMultiSigningData` split for batch-signer optimization; `signingForID` helper for arbitrary payload bytes +- `Batch.h` — inline `serializeBatch()`: `HashPrefix::batch ‖ flags(4) ‖ count(4) ‖ txids` - `serialize.h` — top-level convenience helpers -- `messages.h` — protobuf message tag constants +- `messages.h` — protobuf message tag constants (`TYPE_BOOL` undef guard documented) ### Higher-Level Objects -- `STTx.h/cpp` — caches `tid_` and `tx_type_`; `passesLocalChecks` (memos, pseudo-tx, MPT support, batch nesting); `sterilize()` round-trip -- `STLedgerEntry.h/cpp` (alias `SLE`) — typed ledger object; `thread()` updates `sfPreviousTxnID`; `isThreadedType()` gated by `fixPreviousTxnID` -- `STValidation.h/cpp` — lazy `valid_` cache; `mTrusted` separate from validity; `lookupNodeID` callback decouples manifest system +- `STTx.h/cpp` — caches `tid_` and `tx_type_`; `passesLocalChecks` (memos, pseudo-tx, MPT support, batch nesting, max 8 inner txs); `sterilize()` round-trip; `getBatchTransactionIDs()` lazy + immutable after first call; `getFeePayer()` returns `sfDelegate` or `sfAccount`; `checkSign()` dispatches single/multi/batch/counterparty; SQL helpers (`getMetaSQL`) +- `STLedgerEntry.h/cpp` (alias `SLE`) — typed ledger object; `thread()` updates `sfPreviousTxnID`; `isThreadedType()` gated by `fixPreviousTxnID`; `getJson()` injects `jss::index` and synthetic `mpt_issuance_id` for MPT issuances +- `STValidation.h/cpp` — lazy `valid_` cache; `mTrusted` separate from validity; `lookupNodeID` callback decouples manifest system; `validationFormat()` is function-local static (SField init order safety); `sfCookie` is `soeDEFAULT` to prevent fingerprinting - `STArray.h/cpp`, `STVector256.h/cpp`, `STBitString.h/cpp`, `STInteger.h/cpp`, `STBlob.h/cpp`, `STAccount.h/cpp`, `STPathSet.h/cpp`, `STXChainBridge.h/cpp` ### Transaction Meta -- `TxMeta.h/cpp` — `AffectedNodes` (sorted by index!), `DeliveredAmount`; serialization order is consensus-critical +- `TxMeta.h/cpp` — `AffectedNodes` (sorted by index in `addRaw()`!), `DeliveredAmount`, `sfParentBatchID`; linear scan for node lookup (bounded by 32-slot reservation); `getAffectedAccounts()` must match JS `Meta#getAffectedAccounts` - `LedgerHeader.h/cpp` — 118-byte fixed serialization; close-time-resolution fudging ### Indexes and Keys -- `Indexes.h/cpp` — `keylet::*` factories with `LedgerNameSpace` tagged hashing; `keylet::quality()` embeds 64-bit quality in last 8 bytes (big-endian) +- `Indexes.h/cpp` — `keylet::*` factories with `LedgerNameSpace` tagged hashing; `keylet::quality()` embeds 64-bit quality in last 8 bytes (big-endian); `keylet::amm()` uses `std::minmax` + `if constexpr` for heterogeneous token types; `nftpage` = owner(160 bits) ‖ masked token(96 bits) — range scan, no hash - `Keylet.h/cpp` — type-tagged `(uint256, LedgerEntryType)`; `ltANY` wildcard, `ltCHILD` rejects directories - `Protocol.h` — protocol-wide constants (`FLAG_LEDGER_INTERVAL`, etc.) - `nftPageMask.h`, `nft.h` — NFT page boundary (low 96 bits); composite keys (high 160 = owner AccountID) -- `NFTokenID.h/cpp` — flags(2)+fee(2)+issuer(20)+cipheredTaxon(4)+serial(4); LCG `384160001 * seq + 2459` ciphers taxon -- `NFTokenOfferID.h/cpp`, `NFTSyntheticSerializer.h/cpp` — derived/synthetic NFT entries +- `NFTokenID.h/cpp` — flags(2)+fee(2)+issuer(20)+cipheredTaxon(4)+serial(4); LCG `384160001 * seq + 2459` ciphers taxon; `getNFTokenIDFromPage()` and `getNFTokenIDFromDeletedOffer()` for metadata enrichment +- `NFTokenOfferID.h/cpp`, `NFTSyntheticSerializer.h/cpp` — derived/synthetic NFT entries; consumed by Clio as public API - `Book.h` — `(in_asset, out_asset)` order-book identity -- `SeqProxy.h/cpp` — sequence vs ticket abstraction +- `SeqProxy.h/cpp` — sequence vs ticket abstraction; sequence-type values sort before ticket-type values ### Validation Helpers (return NotTEC, preflight-time) - `AMMCore.h/cpp` — `invalidAMMAsset`, `invalidAMMAssetPair`, `invalidAMMAmount`; `ammLPTCurrency()` uses canonical `std::minmax` @@ -290,6 +342,7 @@ Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in field - `STParsedJSON.h/cpp` — depth cap 64; field-path-qualified errors via `make_name`; recognizes `"Payment"`, `"tesSUCCESS"`, etc. - `ErrorCodes.h/cpp` — append-only enum; `sortedErrorInfos` validated at compile time; `warning_code_i` distinct from `error_code_i` - `RPCErr.h/cpp` — `RPC::Status`/`make_error` helpers +- `ApiVersion.h` — `apiMinimumSupportedVersion`(1), `apiMaximumSupportedVersion`(2), `apiBetaVersion`(3); `apiVersionIfUnspecified`(1); `forAllApiVersions` / `forApiVersions` templates pass version as `integral_constant` for compile-time branching; `getAPIVersionNumber()` returns `apiInvalidVersion`(0) on parse failure; `setVersion()` has v1 legacy semver-string shim - `MultiApiJson.h` — per-API-version `Json::Value` array indexed `[version - RPC::apiMinimumVersion]`; composes with `forAllApiVersions` from `ApiVersion.h`; preserves wire compatibility across versions - `jss.h` — every JSON key as `Json::StaticString` via `JSS(name)` macro; PascalCase = protocol fields, snake_case = RPC - `json_get_or_throw.h` — `getOrThrow(jv, name)` specializations enforce presence + type; standard idiom for parsing untrusted JSON @@ -298,7 +351,7 @@ Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in field ### Specialized Types - `STIssue`, `STAccount` (160-bit, VL-encoded), `STBitString`, `STInteger`, `STBlob`, `STArray`, `STVector256`, `STCurrency`, `STPathSet`, `STXChainBridge`, `STNumber` (asset-contextual) - `Quality.h/cpp` — inverted encoding (lower uint64 = higher quality); `ceil_in`/`ceil_out` proportional scaling; `_strict` variants honor Number rounding mode -- `QualityFunction.h/cpp` — linear `q(out)=m*out+b`; AMMTag (slope from pool) vs CLOBLikeTag (m=0); `combine()` for multi-step strands +- `QualityFunction.h/cpp` — linear `q(out)=m*out+b`; AMMTag (slope from pool) vs CLOBLikeTag (m=0); `combine()` for multi-step strands; `outFromAvgQ()` solves for capped output - `XChainAttestations.h/cpp` — `Attestations::` namespace (full, with signature) vs `xrpl::` (stored); `match()` returns three-state `AttestationMatch` ### Pseudo-Account Fields (`sMD_PseudoAccount`) @@ -311,6 +364,7 @@ Pseudo-accounts (AMM, Vault, LoanBroker) carry a 256-bit synthesized ID in field - `TER.h/cpp` — error code enum families + `TERSubset` - `TxFlags.h` — X-macro driven flag tables (`tf*`); see TxFlags Architecture below - `TxFormats.h/cpp` — transaction-type → field schema +- `AccountID.h/cpp` — `calcAccountID()` = SHA-256 then RIPEMD-160 (matches Bitcoin for security argument); `AccountIdCache` direct-mapped with spinlock sharding ## Numeric Encoding Reference @@ -322,6 +376,7 @@ MPT max: maxMPTokenAmount = INT64_MAX = 0x7FFFFFFFFFFFFFFF Transfer rate: Rate{value} where value/1_000_000_000 = 1.0 (parityRate = 1:1) NFT transfer fee: uint16 basis points (0–50000), convert via nft::transferFeeAsRate (×10000) AMM auction fee: basis points; trading fee in tenths of basis points (10000 = 1%) +STNumber mantissa: int64 signed; when MantissaRange::large active, accessor divides by 10 to fit wire format ``` ## Protocol-Stable Constants (NEVER CHANGE) @@ -329,7 +384,7 @@ AMM auction fee: basis points; trading fee in tenths of basis points (10000 = - `LedgerEntryType` numeric values (in ledger objects) - `TxType` numeric values (in signed transactions) - `SerializedTypeID` and `SField` codes (in serialized fields) -- `LedgerNameSpace` discriminator characters (in keylet derivation) +- `LedgerNameSpace` discriminator characters (in keylet derivation) — legacy `CONTRACT`, `GENERATOR`, `NICKNAME` reserved even though deprecated - `HashPrefix` enum values (in signature/hash domain separation) - `error_code_i` and `warning_code_i` numeric values (clients depend on them; append-only) - `TECcodes` (and other `TER` family numeric values) — recorded in transaction metadata @@ -340,6 +395,8 @@ AMM auction fee: basis points; trading fee in tenths of basis points (10000 = - `INITIAL_XRP = 100B × 10^6 drops` (validated by `static_assert` against `Number::maxRep`) - NFT taxon LCG constants (`384160001 * seq + 2459`) - All flag bit values (`tf*`, `lsf*`, `asf*`) +- XRPL Base58 alphabet (first char `'r'` for `AccountID=0` is cosmetically significant) +- `maxBatchTxCount = 8` (inner transactions per batch) Changing any requires an amendment with explicit detection logic for old/new behavior. @@ -354,3 +411,19 @@ Changing any requires an amendment with explicit detection logic for old/new beh - Inner-batch flag `tfInnerBatchTxn` is special: marks a tx as a member of a batch (skipped by ordinary preflight signature checks) Pattern: when adding a new flag, define it in `TxFlags.h` in the appropriate group; do NOT manually adjust the mask — the macro derives it. + +## STTx Construction Paths + +Three constructors, all terminate with `tid_ = getHash(HashPrefix::transactionID)`: + +1. **Wire** (`SerialIter&`) — hottest path; enforces `txMinSizeBytes` (32) and `txMaxSizeBytes` (1 MB) before field parsing; `set(sit)` returning `true` (inner object terminator found at top level) → throws +2. **Object promotion** (`STObject&&`) — no size check; `applyTemplate` enforces conformance +3. **Programmatic** (`TxType, assembler`) — installs template first; asserts `sfTransactionType` unchanged after assembler runs; `LogicError` (not `std::runtime_error`) on mutation + +`getSeqProxy()` unifies `sfSequence` (classic) and `sfTicketSequence` (ticket); when `sfSequence==0` and `sfTicketSequence` present → ticket mode. Sequence-type always sorts before ticket-type. + +`getFeePayer()` returns `sfDelegate` if present, else `sfAccount`. Authorization is enforced in `Transactor::checkPermission`, not here. + +## Counterparty Signing (`sfCounterpartySignature`) + +Used by `LoanSet` — allows a second party to sign the same transaction. `checkSign(Rules const&)` checks primary then counterparty (if field present). Errors from counterparty check are prefixed `"Counterparty: "`. `sign()` accepts optional `signatureTarget` reference to write into a sub-object. diff --git a/docs/skills/rpc.md b/docs/skills/rpc.md index 0fdb782dfd..99e72ced37 100644 --- a/docs/skills/rpc.md +++ b/docs/skills/rpc.md @@ -8,8 +8,9 @@ JSON-RPC over HTTP/WebSocket and gRPC. Central handler table dispatches by metho - `conditionMet` checks amendment-blocked, UNL expired, operating mode ≥ SYNCING, validated ledger age < `Tuning::maxValidatedLedgerAge` (2 min), and validated/current gap ≤ 10 ledgers. Standalone mode bypasses age checks. - API v1 vs v2 error code split: v1 emits `rpcNO_NETWORK`/`rpcNO_CURRENT`/`rpcNO_CLOSED`; v2+ collapses to `rpcNOT_SYNCED`. - Sensitive fields (`passphrase`, `secret`, `seed`, `seed_hex`) are masked as `` in error response echoes. -- Batch requests: top-level `"method": "batch"` with `params` array; each sub-request processed independently and accumulated into JSON array. +- Batch requests: top-level `"method": "batch"` with `params` array; each sub-request processed independently and accumulated into JSON array. Batch is rejected entirely if any sub-request is itself a batch (no nesting). - API v2 enforces strict JSON typing on previously-permissive boolean fields (e.g., `signer_lists`, `binary`, `forward`, `transactions`). +- Two handler registration styles exist but only two handlers use the new-style (class-based): `LedgerHandler` and `VersionHandler`. All ~67 others use old-style free functions in `handlerArray`. ## Common Bug Patterns @@ -21,6 +22,10 @@ JSON-RPC over HTTP/WebSocket and gRPC. Central handler table dispatches by metho - Marker pagination: callers can forge markers pointing into other accounts' directories — always call `RPC::isRelatedToAccount` before resuming. - `parse()` returning `std::nullopt` is a programming-error sentinel for type system; user-facing errors go through `required` / `Expected`. - `loadType` must be set early in handler — escalates to `feeExceptionRPC` automatically on exception if still `feeReferenceRPC`. +- `WSInfoSub` only trusts `X-User`/`X-Forwarded-For` headers when the remote IP is in `secure_gateway_nets`; outside that list, those headers are stripped. Misconfiguring `secure_gateway` lets untrusted clients spoof identity. +- `RPCSub::sendThread` reads `mUsername`/`mPassword` outside `mLock` — minor data-race noted in source with `XXX` comment. +- `PathRequestManager::getAssetCache` assigns the `weak_ptr` lock result to a local `shared_ptr` before returning — assigning directly to the weak member would cause immediate expiry. +- `account_objects` marker encodes phase (NFT page vs directory); resuming with a wrong-phase marker can traverse the wrong list. Both `nftPageStart` and directory marker use different sentinel shapes. ## Adding New RPC Handler @@ -83,21 +88,22 @@ template <> Handler handlerFrom() { - `include/xrpl/protocol/ErrorCodes.h` — `error_code_i`, `inject_error`, `ErrorInfo` table, HTTP status mapping. ### Subscriptions -- `src/xrpld/rpc/detail/WSInfoSub.h` — WebSocket `InfoSub` subclass; `Json::stream`-based zero-intermediate serialization into `multi_buffer`. -- `src/xrpld/rpc/RPCSub.h` / `detail/RPCSub.cpp` — outbound HTTP/HTTPS push subscription ("webhook"); producer/consumer with `jtCLIENT_SUBSCRIBE` jobs; carries `VFALCO TODO` markers (legacy). +- `src/xrpld/rpc/detail/WSInfoSub.h` — WebSocket `InfoSub` subclass; `Json::stream`-based zero-intermediate serialization into `multi_buffer`. Only trusts `X-User`/`X-Forwarded-For` when remote IP is in `secure_gateway_nets`. +- `src/xrpld/rpc/RPCSub.h` / `detail/RPCSub.cpp` — outbound HTTP/HTTPS push subscription ("webhook"); legacy feature maintained for one specific partner; carries `VFALCO TODO` markers. `sendThread` reads `mUsername`/`mPassword` outside `mLock` (data-race risk). - Streams: `server`, `ledger`, `book_changes`, `transactions`, `transactions_proposed` (`rt_transactions` deprecated alias), `validations`, `manifests`, `peer_status` (admin), `consensus`. +- `book_changes` stream can be subscribed but **cannot be unsubscribed** — there is no unsubscribe path for it. - `account_history_tx_stream` is experimental; gated on `useTxTables()`; supports `stop_history_tx_only` in unsubscribe. ### Pathfinding - `src/xrpld/rpc/detail/Pathfinder.cpp` / `.h` — three-phase engine: `findPaths()` (template expansion via static `mPathTable`), `computePathRanks()` (RippleCalc simulation), `getBestPaths()` (selection with covering-path). - `src/xrpld/rpc/detail/PathRequest.cpp` / `.h` — per-request state machine; two constructors for `path_find` (subscription) vs `ripple_path_find` (legacy callback); adaptive `iLevel`. -- `src/xrpld/rpc/detail/PathRequestManager.cpp` / `.h` — collection of `wptr`; re-entrant `updateAll()` loop; shared `AssetCache` via `weak_ptr` (intentional, see `getAssetCache`). -- `src/xrpld/rpc/detail/AssetCache.cpp` / `.h` — per-ledger thread-safe trust line + MPT cache; direction-superset optimization for trust lines; `shared_ptr>` null sentinels for empty accounts. +- `src/xrpld/rpc/detail/PathRequestManager.cpp` / `.h` — collection of `wptr`; re-entrant `updateAll()` loop; shared `AssetCache` via `weak_ptr` (intentional — cache lives only as long as a request holds it). +- `src/xrpld/rpc/detail/AssetCache.cpp` / `.h` — per-ledger thread-safe trust line + MPT cache; **direction-superset optimization**: caches lines in both directions (AB and BA) so a single fetch covers both source and destination queries; `shared_ptr>` null sentinels for empty accounts. - `src/xrpld/rpc/detail/AccountAssets.cpp` / `.h` — `accountSourceAssets` / `accountDestAssets` for path source/dest currency enumeration. - `src/xrpld/rpc/detail/TrustLine.cpp` / `.h` — `TrustLineBase` + `PathFindTrustLine` (memory-minimal) + `RPCTrustLine` (adds quality rates). -- `src/xrpld/rpc/detail/MPT.h` — `PathFindMPT` (MPTID + zeroBalance + maxedOut). -- `src/xrpld/rpc/detail/PathfinderUtils.h` — `largestAmount`, `convertAmount`, `convertAllCheck` for "convert all" sentinel handling. -- `src/xrpld/rpc/detail/LegacyPathFind.cpp` / `.h` — RAII concurrency guard for synchronous `ripple_path_find`; lock-free CAS on `inProgress` counter; admin bypass. +- `src/xrpld/rpc/detail/MPT.h` — `PathFindMPT` (MPTID + zeroBalance + maxedOut); implicitly converts from `MPTIssue` for uniform interface with `PathFindTrustLine`. +- `src/xrpld/rpc/detail/PathfinderUtils.h` — `largestAmount`, `convertAmount`, `convertAllCheck`; XRP sentinel = `STAmount::cMaxNative`, IOU sentinel = `STAmount::cMaxValue / cMaxOffset`, MPT sentinel = `maxMPTokenAmount`. Pass these to signal "send all". +- `src/xrpld/rpc/detail/LegacyPathFind.cpp` / `.h` — RAII concurrency guard for synchronous `ripple_path_find`; lock-free CAS on `inProgress` counter; admin bypass. Destructor skips decrement if construction failed (`m_isOk` flag). - `src/xrpld/rpc/detail/RippleLineCache.cpp` / `.h` — **empty stubs**; functionality replaced by `AssetCache`. Still `#include`d in two files for inert compatibility. ### Transaction Signing / Submission @@ -105,10 +111,11 @@ template <> Handler handlerFrom() { - Fee pipeline: `checkFee` → `getCurrentNetworkFee` (max of load-scaled base fee and TxQ-escalated open ledger fee, capped by `fee_mult_max`/`fee_div_max`). - `ProcessTransactionFn` dependency injection via `getProcessTxnFn(NetworkOPs&)` for testability. - `acctMatchesPubKey` handles three account states: unactivated (master-only), master+regular both valid, master disabled (regular only). +- `doSubmit`: pre-seeds `HashRouter` with the transaction hash before forwarding, so the node does not rebroadcast its own submission. ### Utility / Enrichment - `src/xrpld/rpc/BookChanges.h` — header-only template `computeBookChanges(ledger)`; produces OHLCV per pair; reused by RPC handler and `book_changes` subscription stream. -- `src/xrpld/rpc/CTID.h` — Concise Transaction ID (XLS-15d): 16-hex `C` + 28-bit ledgerSeq + 16-bit txnIdx + 16-bit netID. Boost regex; `boost::regex` not `std::regex`. +- `src/xrpld/rpc/CTID.h` — Concise Transaction ID (XLS-15d): 16-hex `C` + 28-bit ledgerSeq + 16-bit txnIdx + 16-bit netID. Uses `boost::regex`, not `std::regex`. - `src/xrpld/rpc/DeliveredAmount.h` / `detail/DeliveredAmount.cpp` — `insertDeliveredAmount`; three-tier resolution; threshold = ledger 4594095 (Jan 2014) or close time 446000000s; `"unavailable"` string sentinel for pre-threshold ledgers; lazy lambdas avoid `LedgerMaster` calls when meta has `sfDeliveredAmount`. - `src/xrpld/rpc/MPTokenIssuanceID.h` / `detail/MPTokenIssuanceID.cpp` — `insertMPTokenIssuanceID`; mirrors `DeliveredAmount` three-function pattern (eligibility / extraction / injection). - `src/xrpld/rpc/handlers/server_info/ServerDefinitions.cpp` — built once at startup via Meyers singleton; SHA-512-half hash for client cache invalidation; X-macro–driven from protocol headers. @@ -119,6 +126,16 @@ template <> Handler handlerFrom() { ### Client-side - `src/xrpld/rpc/RPCCall.h` / `detail/RPCCall.cpp` — `xrpld` CLI dispatch; `RPCParser` with static `Command[]` table mapping method → parse function; "trusted interface" — minimal validation by design. +## Enrichment Pipeline + +Three functions are called together wherever transaction metadata is serialized to JSON. **Always apply all three** at the same call sites to keep responses consistent: + +1. `insertDeliveredAmount()` — actual delivered amount for payments/check-cash/account-delete. +2. `RPC::insertNFTSyntheticInJson()` — synthetic NFT fields (`nft_offer_index`, `nftoken_id`, etc.) extracted from metadata. +3. `RPC::insertMPTokenIssuanceID()` — `mpt_issuance_id` for successful `MPTokenIssuanceCreate` transactions. + +Call sites: `Tx.cpp`, `AccountTx.cpp`, `NetworkOPs.cpp`, `LedgerToJson.cpp`, `Simulate.cpp`. + ## Resource Cost Tiers Set `context.loadType` early. Tiers (from `Fees.h`): @@ -154,7 +171,7 @@ if (context.role != Role::ADMIN && !context.app.config().canSign()) - v2 strict boolean typing; v1 silently coerces. - v2 rejects mixing `ledger_index_min`/`max` with `ledger_index`/`ledger_hash` in `account_tx`; v1 tolerates. - v2 enforces precise marker objects in `account_tx` (`{ledger, seq}` integers). -- v3 (beta) adds human-readable singleton aliases in `ledger_entry` index lookup. +- v3 (beta) adds human-readable singleton aliases in `ledger_entry` index lookup. `VersionHandler::maxApiVer` tracks the highest beta version; the width of the beta range (currently 1) matters for the version negotiation boundary. ## Handler Subdirectory Map @@ -164,7 +181,7 @@ if (context.role != Role::ADMIN && !context.app.config().canSign()) - `handlers/orderbook/` — `AMMInfo`, `BookChanges`, `BookOffers`, `DepositAuthorized`, `GetAggregatePrice`, `NFTBuyOffers` / `NFTSellOffers` / `NFTOffersHelpers.h`, `PathFind` (subscription), `RipplePathFind` (one-shot). - `handlers/server_info/` — `Fee`, `Feature`, `Manifest`, `ServerDefinitions`, `ServerInfo`, `ServerState`, `Version.h` (class-based). - `handlers/subscribe/` — `Subscribe`, `Unsubscribe`. -- `handlers/transaction/` — `Simulate` (dry-run via `tapDRY_RUN`), `Submit`, `SubmitMultiSigned`, `Tx`, `TransactionEntry` (ledger-pinned), `TxHistory` (paginated, `useTxTables()`-gated, deep-page cap 10000 for non-admin), `TxReduceRelay`. +- `handlers/transaction/` — `Simulate` (dry-run via `tapDRY_RUN`; batch rejected), `Submit`, `SubmitMultiSigned`, `Tx`, `TransactionEntry` (ledger-pinned), `TxHistory` (paginated, `useTxTables()`-gated, deep-page cap 10000 for non-admin), `TxReduceRelay`. - `handlers/utility/` — `Ping` (role-conditional response), `Random`. - Top-level `handlers/`: `ChannelVerify` (no admin restriction, stateless), `VaultInfo` (XLS-66, vault + MPT issuance lookup). @@ -188,3 +205,8 @@ if (context.role != Role::ADMIN && !context.app.config().canSign()) - `getCountsJson` (in `GetCounts.h`) is callable from non-RPC contexts (e.g., `OverlayImpl::getCountsJson`). - `wallet_propose` entropy warning: <80 bits → strong warning; passphrase that already encodes the seed (1751/Base58/hex) suppresses warning. - Account marker security: always verify `RPC::isRelatedToAccount(*ledger, sle, accountID)` on resumed pagination — prevents cross-account directory traversal. +- `AMMInfo` has two distinct parsing paths: one for `amm_account` (direct lookup) and one for `asset`+`asset2` pair (derives AMM account via `keylet::amm`). Both resolve to the same AMM SLE but through different code paths — changes must update both. +- `book_offers` applies an inline load-shedder: if `checkFee` determines the consumer is at warning/drop tier, it cuts the offer limit in half before iterating. +- `Simulate` rejects batch transactions (returns `rpcINVALID_PARAMS`) — `tapDRY_RUN` does not support the batch transaction type. +- `autofillSignature()` in `Simulate.cpp` removes `SigningPubKey` and `TxnSignature` fields before applying dry-run, then restores them; callers must not pre-sign before calling Simulate. +- `RPCSub` is a legacy feature retained for one specific partner. It is **not** a general-purpose webhook system. New subscribers should use WebSocket instead. diff --git a/docs/skills/shamap.md b/docs/skills/shamap.md index e74aff2d8d..fe0023e9e8 100644 --- a/docs/skills/shamap.md +++ b/docs/skills/shamap.md @@ -1,6 +1,6 @@ # SHAMap -Authenticated 16-way radix Merkle trie. Every ledger has two: a TRANSACTION tree (txid → tx, with or without metadata) and a STATE tree (object key → serialized object). Root hash is what validators sign — two nodes agree on ledger state iff their root hashes match. Tree depth is fixed at 64 (256-bit keys consumed 4 bits per level). +Authenticated 16-way radix Merkle trie. Every ledger has two: a TRANSACTION tree (txid → tx, with or without metadata) and a STATE tree (object key → serialized object). Root hash is what validators sign — two nodes agree on ledger state iff their root hashes match. Tree depth is fixed at 64 (256-bit keys consumed 4 bits per level, 65 levels total: root at depth 0, leaves at depth 64). Root is always a `SHAMapInnerNode`. Leaves only appear at depth 64. @@ -9,7 +9,7 @@ Root is always a `SHAMapInnerNode`. Leaves only appear at depth 64. - **Tree shape**: `branchFactor = 16`, `leafDepth = 64`, max 65 levels (root at depth 0). One nibble per level. - **CoW ownership**: each node has a `cowid_`. Non-zero → exclusively owned by that map and mutable. Zero → shared/canonicalized, must not be mutated. `setItem()`, `setChild()`, `dirtyUp()` all assert `cowid_ != 0`. - **`unshareNode` before any mutation** of a shared node — #1 bug class. Skipping it corrupts every snapshot sharing the node. -- **`canonicalize`** ensures one in-memory instance per hash via `Family::getTreeNodeCache()`. Asserts `cowid == 0` on entry. +- **`canonicalize`** ensures one in-memory instance per hash via `Family::getTreeNodeCache()`. Asserts `cowid == 0` on entry AND on values returned by `cacheLookup()`. - **`SHAMapNodeID` masking**: `id_ == (id_ & depthMask(depth_))` is enforced by constructor. Two nodes at the same depth on the same path always have identical `SHAMapNodeID`. - **Inner node concurrency**: per-child bit spinlock in `lock_` (`std::atomic`, one bit per branch). Allows concurrent reads of different children. `setChild`/`shareChild` skip locking because they require CoW ownership. - **Leaf size floor**: `SHAMapItem::size() >= 12` asserted at leaf construction. @@ -37,17 +37,19 @@ node->setChild(index, child); // safe to modify `snapShot(isMutable)` does NOT copy nodes. It bumps the original's `cowid_` and shares the `root_` pointer. Subsequent writes call `unshareNode()` which clones on first touch. Immutable snapshots of immutable maps share everything with zero clone cost. -The copy constructor's `unshare()` call breaks sharing eagerly when either side is mutable, preventing later mutations from racing. +The copy constructor increments `cowid_` (`cowid_(other.cowid_ + 1)`) and calls `unshare()` if either side is mutable, preventing later mutations from racing. An immutable copy of an immutable map shares all nodes with zero clone cost. `getHash()` does `const_cast(*this).unshare()` when the root hash is zero — acknowledged design compromise (logical read, physical mutate). ## Key Files -- `include/xrpl/shamap/SHAMap.h` — main class, state machine, MissingNodes struct +- `include/xrpl/shamap/SHAMap.h` — main class, state machine, `MissingNodes` struct - `include/xrpl/shamap/SHAMapTreeNode.h` — base; `cowid_`, `hash_`, `IntrusiveRefCounts`, wire-type constants - `include/xrpl/shamap/SHAMapInnerNode.h` — 16-way branch, sparse `TaggedPointer`, per-bit spinlocks, `fullBelowGen_` - `include/xrpl/shamap/SHAMapLeafNode.h` — abstract leaf base, `item_` slot, `setItem` returns hash-changed bool -- `include/xrpl/shamap/SHAMapTxLeafNode.h`, `SHAMapTxPlusMetaLeafNode.h`, `SHAMapAccountStateLeafNode.h` — three leaf types +- `include/xrpl/shamap/SHAMapTxLeafNode.h` — tx without metadata (`tnTRANSACTION_NM`) +- `include/xrpl/shamap/SHAMapTxPlusMetaLeafNode.h` — tx with metadata (`tnTRANSACTION_MD`) +- `include/xrpl/shamap/SHAMapAccountStateLeafNode.h` — account state leaf (`tnACCOUNT_STATE`) - `include/xrpl/shamap/SHAMapItem.h` — slab-allocated, struct-hack payload, intrusive refcount - `include/xrpl/shamap/SHAMapNodeID.h` — (depth, masked-prefix) tree address - `include/xrpl/shamap/SHAMapMissingNode.h` — exception + `SHAMapType` enum (TX/STATE/FREE) @@ -61,6 +63,7 @@ The copy constructor's `unshare()` call breaks sharing eagerly when either side - `src/libxrpl/shamap/SHAMapSync.cpp` — getMissingNodes, getNodeFat, addRootNode/addKnownNode, proofs - `src/libxrpl/shamap/SHAMapDelta.cpp` — compare, walkMap, walkMapParallel - `src/libxrpl/shamap/SHAMapInnerNode.cpp` — sparse storage mechanics, hash, serialization +- `src/libxrpl/shamap/SHAMapLeafNode.cpp` — abstract leaf shared behavior - `src/libxrpl/shamap/SHAMapTreeNode.cpp` — deserialization factories - `src/libxrpl/shamap/SHAMapNodeID.cpp` — depth mask table, branch nav, wire format @@ -68,18 +71,20 @@ The copy constructor's `unshare()` call breaks sharing eagerly when either side All inherit `SHAMapLeafNode` (which inherits `SHAMapTreeNode`). All `final`. Differ only in hash prefix, wire-type byte, and whether the key is fed into the hash. -| Class | Hash prefix | Key in hash? | Key in wire? | Wire-type byte | +| Class | Hash prefix (bytes) | Key in hash? | Key in wire? | Wire-type byte | |---|---|---|---|---| -| `SHAMapTxLeafNode` | `transactionID` (`TXN`) | no | no | `wireTypeTransaction = 0` | -| `SHAMapTxPlusMetaLeafNode` | `txNode` (`SND`) | yes | yes | `wireTypeTransactionWithMeta = 4` | -| `SHAMapAccountStateLeafNode` | `leafNode` (`MLN`) | yes | yes | `wireTypeAccountState = 1` | +| `SHAMapTxLeafNode` | `HashPrefix::transactionID` (`'T','X','N'`) | no | no | `wireTypeTransaction = 0` | +| `SHAMapTxPlusMetaLeafNode` | `HashPrefix::txNode` (`'S','N','D'`) | yes | yes | `wireTypeTransactionWithMeta = 4` | +| `SHAMapAccountStateLeafNode` | `HashPrefix::leafNode` (`'M','L','N'`) | yes | yes | `wireTypeAccountState = 1` | -**Why the asymmetry**: a transaction's ID *is* `sha512Half(prefix, blob)`, so the key is already implied by the data. Account state keys (account address, offer index, etc.) do NOT appear in the blob and must be hashed in or two distinct objects with identical payloads would collide. +**Why the asymmetry**: a transaction's ID *is* `sha512Half(prefix, blob)`, so the key is already implied by the data. Account state and tx+meta keys (account address, offer index, etc.) do NOT appear in the blob and must be hashed in — otherwise two distinct objects with identical payloads would collide. Open ledgers hold transactions as `tnTRANSACTION_NM`; closed ledgers rebuild the tx tree as `tnTRANSACTION_MD` after metadata is attached. Hash prefix difference makes the two roots structurally incompatible — by design. Each concrete leaf has two constructors: hash-recompute (used by initial `make_*`) and hash-supplied (used by `clone()` and deserialization with `hashValid = true`). +**`SHAMapLeafNode`** is the intermediate abstract base. Both constructors are `protected` — only concrete subclasses call them. `isLeaf()`, `isInner()`, and `invariants()` are sealed `final override` here. `invariants()` asserts hash non-zero and item non-null. `item_` is `protected` (not `private`) so concrete subclasses can access it directly in their inline `updateHash()` implementations, avoiding virtual dispatch overhead in the hash path. + ## SHAMapItem Single allocation: struct fields + payload bytes via struct hack (placement-new'd after `sizeof(*this)`). Constructor is `private`, all copy/move deleted; only path is `make_shamapitem()`. @@ -88,7 +93,7 @@ Backed by `detail::slabber` — a `SlabAllocatorSet` with seven tier `refcount_` is `mutable std::atomic`. `intrusive_ptr_add_ref` calls `LogicError` if count was already zero (resurrection guard). `intrusive_ptr_release` runs `std::destroy_at` then returns memory to `slabber.deallocate()`, falling through to `delete[]` when the pointer didn't come from a slab. -`SHAMapLeafNode::item_` is `boost::intrusive_ptr` — items are immutable; mutation produces a new item. +`SHAMapLeafNode::item_` is `boost::intrusive_ptr` — items are immutable; mutation produces a new item. The `const` through the pointer allows the same `SHAMapItem` to be shared across CoW snapshots without copying. ## Inner Node: Sparse Storage via TaggedPointer @@ -107,21 +112,25 @@ This typically cuts inner-node memory to ~25% of a dense layout. `iterChildren(F)` exposes all 16 logical branches (zero-hash for empties — needed for `updateHash`). `iterNonEmptyChildIndexes(F)` gives `(branchNum, arrayIdx)` — used for mutation and serialization. +**Two compile-time constraints** in `TaggedPointer.ipp`: `boundaries.size() <= 4` (tag is exactly 2 bits) and `boundaries.back() == branchFactor`. Adding a 5th boundary fails the build. + ## SHAMapNodeID Two fields: `uint256 id_` (path prefix, masked to depth) and `unsigned depth_` (0–64). The static `depthMask()` table has 65 entries: nibble `d/2` gets `0xF0` at odd depths, `0xFF` at even, accumulating top-down. -- Constructor **rejects** unmasked input (asserts `id == id & depthMask`). +- Constructor **rejects** unmasked input (asserts `id == id & depthMask`). - `createID(depth, key)` is the factory that **applies** the mask for you. Asymmetric API by design. - `getChildNodeID(m)` throws (not just asserts) at `leafDepth` — corrupted data may trigger this in release builds. - `selectBranch(id, hash)` reads the nibble at `id.depth` from `hash`. The traversal primitive used everywhere. - Wire format: 33 bytes (32 id + 1 depth). `deserializeSHAMapNodeID` returns `std::optional` and validates size, depth ≤ 64, and the mask invariant. +`operator<` sorts by `std::tie(depth_, id_)` — shallower before deeper, then by prefix value. Used as map/set key for traversal frontiers. + ## Hash Computation Inner: `sha512Half(HashPrefix::innerNode, hashes[0..15])` — always feed all 16, zeros for empties. Hash is identical regardless of dense/sparse storage. -`updateHash()` (read from `hashes` array directly) vs `updateHashDeep()` (pull from child pointers first, then compute) — the latter is used after batch mutations where in-memory child hashes were updated but the local hashes array wasn't synced. +`updateHash()` reads from `hashes` array directly. `updateHashDeep()` pulls hashes from child pointers first, then computes — used after batch mutations where in-memory child hashes were updated but the local hashes array wasn't synced. ## Wire Serialization Formats @@ -129,7 +138,7 @@ Inner: `sha512Half(HashPrefix::innerNode, hashes[0..15])` — always feed all 16 - **Compressed** (< 12 children): per non-empty branch, 32-byte hash + 1-byte branch index, total 33·n bytes. - **Full** (≥ 12 children): all 16 hashes in order, 512 bytes. -Type byte appended at the end. `makeFullInner()` / `makeCompressedInner()` are the matching factories, each validating exact size. +Type byte appended at the end. `makeFullInner()` / `makeCompressedInner()` are the matching factories, each validating exact size (throws `std::runtime_error` on mismatch — not silent corruption). `serializeWithPrefix` always emits the full 16-hash form prefixed with `HashPrefix::innerNode` (used for hashing). @@ -137,6 +146,8 @@ Type byte appended at the end. `makeFullInner()` / `makeCompressedInner()` are t `makeFromPrefix(slice, hash)` uses the leading 4-byte `HashPrefix` and is the trusted (hash-known) path — propagates `hashValid = true` to skip recompute in leaf constructors. +**Tag extraction asymmetry**: `makeTransaction` recomputes the key as `sha512Half(HashPrefix::transactionID, data)`. `makeTransactionWithMeta` and `makeAccountState` read the 32-byte key from the *tail* of the payload (via `getBitString`), then `chop` it before creating the `SHAMapItem`. A manual pre-check guards against zero-size reads before calling `getBitString` — see `SHAMapTreeNode.cpp` comment `// FIXME: improve this interface`. + ## Mutation Mechanics All three mutations follow: walk-with-stack → local change → `dirtyUp()`. @@ -146,6 +157,8 @@ All three mutations follow: walk-with-stack → local change → `dirtyUp()`. - **`updateGiveItem`**: locate, unshare, swap payload; call `dirtyUp` only if `setItem()` returns `true` (hash actually changed). Avoids spurious rehashing. - **`dirtyUp`**: consume stack bottom-up, `unshareNode` each, `setChild` to link in the updated subtree. Produces a CoW-owned chain from mutation point to root. +`makeTypedLeaf()` maps `SHAMapNodeType` to one of three concrete leaf classes. Unrecognized types throw `LogicError` immediately. + ## Node Fetching (Backed vs Unbacked) `backed_ = true` integrates with `Family::db()`; `backed_ = false` (set via `setUnbacked()`) is in-memory only (e.g., transient tx-processing trees). @@ -177,6 +190,10 @@ On miss, `full_` is cleared and `Family::missingNodeAcquireBySeq(seq, hash)` is Bundles a target node plus a bounded-depth subtree in one response to amortize sync latency. Depth budget only decrements when an inner node has > 1 child — single-child chains (compressed radix paths) traverse for free. `fatLeaves=true` includes adjacent leaves; otherwise inner-only. +`visitNodes` (used by `visitDifferences` and traversal helpers) uses an explicit `std::stack` to avoid recursion on potentially 64-level trees. + +`visitDifferences` short-circuits at hash equality — O(1) when subtrees match. The `have` pointer is nullable: null means "report everything in `this`." + ## Ingesting Nodes: `addRootNode` / `addKnownNode` Returns `SHAMapAddNode` (tri-state: useful / duplicate / invalid). Aggregated via `+=` across batches in `InboundLedger`. @@ -201,7 +218,7 @@ Mismatch transitions the map to `Invalid` — graceful, not a crash. Skips desce ## Comparison / Delta (`SHAMapDelta.cpp`) -`compare` short-circuits at root: `if (getHash() == other.getHash()) return true`. This is THE point of a Merkle tree — matching subtrees never visited. +`compare` short-circuits at root: `if (getHash() == other.getHash()) return true`. O(d) in the number of differences, not O(n) in total items. Returns `Delta = std::map` where `DeltaItem = pair`. Null first → added; null second → deleted; both → modified. @@ -235,7 +252,7 @@ Production impl: `NodeFamily` (in `src/xrpld/shamap/`). Tests use lighter-weight - Per-node `fullBelowGen_` on `SHAMapInnerNode` (compared against current cache generation in `isFullBelow(gen)`). - The cache itself (queried via `touch_if_exists`). -`clear()` purges entries AND increments `m_gen` — this is a zero-cost global invalidation of every in-memory marker (mismatched generation → `isFullBelow` returns false). `reset()` purges and sets `m_gen = 1` (used at startup / hard reset). +`clear()` purges entries AND increments `m_gen` — zero-cost global invalidation of every in-memory marker (mismatched generation → `isFullBelow` returns false). `reset()` purges and sets `m_gen = 1` (used at startup / hard reset). The distinction matters: any node carrying `fullBelowGen_ > 1` won't match the reset-to-1 state — correct, because those nodes are expected to be recreated fresh. Only `backed_ = true` maps participate. @@ -281,12 +298,13 @@ Implementations: - Modifying a node without `unshareNode` first → corrupts every snapshot sharing it. - Using `std::vector` (instead of `std::deque`) as backing for the `MissingNodes` stack → raw inner-node pointers invalidated on push. - Processing async fetch completions out of order in `getMissingNodes` → incorrect `setFullBelowGen`, subsequent walks skip incomplete subtrees. -- Inner serialization format mismatch (compressed/full) → silent corruption. The branch-count cutoff is 12. +- Inner serialization format mismatch (compressed/full) — branch-count cutoff is 12; deserializer enforces exact sizes and throws `std::runtime_error`. - `addKnownNode` returning `invalid` is normal (empty branch, hash mismatch) — callers must handle, not assume valid. - Proof verification with wrong key at any level → false negative; the verifier uses the supplied key's nibbles to pick branches. -- Failing to inspect the `exceptions` vector after `walkMapParallel` — workers swallow `SHAMapMissingNode` to avoid `std::terminate`; missing nodes go into the result vector but the return value reflects exceptions. +- Failing to inspect the `exceptions` vector after `walkMapParallel` — workers swallow `SHAMapMissingNode` to avoid `std::terminate`; missing nodes go into the result vector but the return value reflects exceptions only. - Mutating a node returned from the `TreeNodeCache` — they have `cowid == 0` by invariant. -- Using a non-leaf-aligned inner node count (`< 12`) but emitting full-inner format, or vice versa — deserializer enforces exact sizes and throws. +- Adding a 5th entry to `TaggedPointer::boundaries` — `static_assert` fails; the 2-bit tag supports exactly 4 values. +- Calling `clone()` with the hash-supplied constructor when the item is also changing — the two-constructor split assumes the item and hash are consistent; pass `hashValid = false` to recompute. ## SHAMapMissingNode Catch Policy diff --git a/docs/skills/sql.md b/docs/skills/sql.md index 2d23983b37..1729ecc4ae 100644 --- a/docs/skills/sql.md +++ b/docs/skills/sql.md @@ -1,116 +1,143 @@ # SQL Database -SQLite via SOCI for ledger/transaction history. Only SQLite is supported; the backend name is validated and any non-`sqlite` value throws at config parse time. +SQLite via SOCI for ledger/transaction history. Only SQLite is supported; any non-`sqlite` backend value in config throws at parse time (`detail::getSociInit` in `SociDB.cpp`). ## Key Invariants - Two main databases: `lgrdb_` (ledger) and `txdb_` (transactions, optional via `useTxTables` config) -- Transaction tables are optional; disabling them means no transaction history or account_tx queries -- WAL checkpointing offloads to `JobQueue` (jtWAL); at most one checkpoint job in flight per `DatabaseCon` (guarded by `running_` mutex) +- Transaction tables are optional; disabling them disables transaction history and `account_tx` queries +- WAL checkpointing offloads to `JobQueue` (`jtWAL`); at most one checkpoint job in flight per `DatabaseCon` (guarded by `running_` mutex in `WALCheckpointer`) - Database init failure is fatal (throws exception, prevents construction) -- Free disk space < 512MB triggers fatal error on write operations -- File extension inconsistency: `validators` and `peerfinder` use `.sqlite`; all other DBs use `.db`. This is historical and enforced in `detail::getSociInit` +- Free disk space < 512 MB triggers fatal error on write operations +- File extension inconsistency: `validators` and `peerfinder` use `.sqlite`; all other DBs use `.db`. Historical artifact enforced in `detail::getSociInit` ## Schema -- `Ledgers` table: seq, hash, parent hash, total coins, close time, etc. Indexed by `LedgerSeq` -- `Transactions` table: TransID, TransType, FromAcct, FromSeq, LedgerSeq, Status, RawTxn, TxnMeta. Indexed by `LedgerSeq` -- `AccountTransactions` table: TransID, Account, LedgerSeq, TxnSeq. Triple-indexed for account_tx queries +- `Ledgers`: seq, hash, parent hash, total coins, close time, etc. Indexed by `LedgerSeq` +- `Transactions`: TransID, TransType, FromAcct, FromSeq, LedgerSeq, Status, RawTxn, TxnMeta. Indexed by `LedgerSeq` +- `AccountTransactions`: TransID, Account, LedgerSeq, TxnSeq. Triple-indexed for `account_tx` queries - Secondary DBs: Wallet (node identity, manifests), PeerFinder (bootstrap cache), State (deletion tracking) - -## Common Bug Patterns - -- No schema migration system; `CREATE TABLE IF NOT EXISTS` means old schemas silently persist with missing columns -- PeerFinder DB is the exception — it has schema versioning via `SchemaVersion` table -- `safety_level` config affects journal_mode and synchronous; "low" can lose data on crash -- `page_size` must be power of 2 between 512-65536; invalid values cause init failure -- Online deletion coordinates between NodeStore rotation and SQL table pruning; race conditions here lose history -- Empty database name passed to `detail::getSociSqliteInit` throws — silent fallback paths are not provided -- A `WALCheckpointer` registered with `sqlite3_wal_hook` outlives its `DatabaseCon` if a checkpoint job is in flight; teardown must wait for the job to drain (see Lifecycle below) +- Schema defined in `src/xrpld/app/main/DBInit.h` +- No schema migration system; `CREATE TABLE IF NOT EXISTS` silently preserves old schemas with missing columns. **Exception**: PeerFinder has schema versioning via a `SchemaVersion` table. ## Configuration | Option | Section | Values | Default | |--------|---------|--------|---------| | `backend` | `[sqdb]` / `[relational_db]` | `sqlite` only | sqlite | -| `page_size` | `[sqlite]` | 512-65536, power of 2 | 4096 | +| `page_size` | `[sqlite]` | 512–65536, power of 2 | 4096 | | `safety_level` | `[sqlite]` | high, medium, low | high | | `journal_size_limit` | `[sqlite]` | integer >= 0 | 1582080 | -## WAL Checkpointer Lifecycle +`safety_level: low` changes `journal_mode` and `synchronous` settings — can lose data on crash. + +## WAL Checkpointing Architecture The checkpointer subsystem is the trickiest part of this module. SQLite's WAL hook is a C callback registered on the native `sqlite3*` connection, but the work runs on a `JobQueue` thread that may still be executing when the owning `DatabaseCon` is destroyed. +### Two-file split + +- **`SociDB.cpp`**: `WALCheckpointer` class (anonymous namespace) — installs the hook, implements `schedule()` and `checkpoint()`, holds the `weak_ptr`. +- **`DatabaseCon.cpp`**: `CheckpointersCollection` class — process-wide singleton registry (`checkpointers`, namespace-scope variable) mapping monotonically-incrementing integer IDs to `shared_ptr`; exposes `create`, `fromId`, `erase`. All `DatabaseCon` instances share this one registry. + +`DatabaseCon.cpp` has no direct SQLite dependency; it only manages the `Checkpointer` abstract interface. + ### ID-based hook indirection -- `WALCheckpointer` (in `SociDB.cpp`) is registered with `sqlite3_wal_hook` using a `std::uintptr_t id_` cast to `void*`, **not** a raw `this` pointer. -- The C hook calls `checkpointerFromId()` which looks up the ID in a process-wide `CheckpointersCollection` (in `DatabaseCon.cpp`). If the lookup returns null, the hook deregisters itself via `sqlite3_wal_hook(conn, nullptr, nullptr)`. -- This protects against the hook firing on a writer thread between the `DatabaseCon` being torn down and the hook being unwired. + +- `WALCheckpointer` is registered with `sqlite3_wal_hook` using its `std::uintptr_t id_` cast to `void*`, **not** a raw `this` pointer. +- The C hook calls `checkpointerFromId()` → `CheckpointersCollection::fromId()` (process-wide singleton). If lookup returns null (connection torn down), the hook deregisters itself via `sqlite3_wal_hook(conn, nullptr, nullptr)`. +- Prevents use-after-free: the hook may fire on a writer thread after `DatabaseCon` begins destruction. ### Session ownership split -- `DatabaseCon` holds `std::shared_ptr`. -- `WALCheckpointer` holds only `std::weak_ptr`. Intentional: if the checkpointer held a `shared_ptr`, an in-flight job would keep the WAL lock alive and a freshly-opened replacement `DatabaseCon` would fail to acquire it. + +- `DatabaseCon` holds `std::shared_ptr`; `WALCheckpointer` holds only `std::weak_ptr`. +- If the checkpointer held a `shared_ptr`, an in-flight job would keep the WAL lock alive, blocking a freshly-opened replacement `DatabaseCon` on the same file. - `WALCheckpointer::checkpoint()` calls `session_.lock()` and bails silently if expired. -### Destructor wait -`DatabaseCon::~DatabaseCon` sequence (order matters): -1. `checkpointers.erase(checkpointer_->id())` — future hook invocations now no-op. -2. Take a `weak_ptr` to the checkpointer, then `checkpointer_.reset()`. +### Destructor sequence (`DatabaseCon::~DatabaseCon`) + +Order matters: +1. `checkpointers.erase(checkpointer_->id())` — future hook invocations now no-op and self-deregister. +2. Take `weak_ptr wk(checkpointer_)`, then `checkpointer_.reset()`. 3. Busy-poll `wk.use_count() != 0` with 100 ms sleeps until all in-flight job lambdas release their `shared_ptr`. -The 100 ms poll is deliberate (rare event, simpler than a condvar). Without this wait, reopening the same SQLite file immediately after destruction can fail because the old checkpoint job still holds the WAL lock. +The 100 ms poll is deliberate (rare event; simpler than a condvar). Without this wait, reopening the same SQLite file immediately after destruction can fail because the old checkpoint job may still hold the WAL lock. + +### `setupCheckpointing()` — deferred wiring + +- Separated from constructors so checkpointing is opt-in. +- Constructors accepting `CheckpointerSetup` open the DB first, then call `setupCheckpointing(JobQueue*, ServiceRegistry&)`. +- Null `JobQueue*` throws `std::logic_error` (programming error, not runtime). +- The checkpointer must be inserted into `CheckpointersCollection` **before** `setupCheckpointing` returns, because the WAL hook is armed inside the `WALCheckpointer` constructor and writes can fire it immediately. ### Checkpoint job behavior -- Triggered by `sqlite3_wal_hook` after every WAL write; module-level `checkpointPageCount = 1000` mirrors SQLite's auto-checkpoint threshold. -- `schedule()` uses a `running_` bool under a mutex to ensure single in-flight job; if `JobQueue` rejects the job, `running_` is reset. -- The enqueued lambda captures `std::weak_ptr` so a destroyed `DatabaseCon` causes the job to exit without touching the session. -- `checkpoint()` calls `sqlite3_wal_checkpoint_v2` with `SQLITE_CHECKPOINT_PASSIVE`. `SQLITE_LOCKED` is logged at trace (expected under reader contention); other errors are warnings. -- Net effect: routes checkpoint work off the writer thread onto `jtWAL`. SQLite would otherwise do this synchronously on whichever thread crossed the page threshold. -### setupCheckpointing -- Separated from `DatabaseCon` constructors so checkpointing is opt-in. -- Constructors taking a `CheckpointerSetup` open the DB first, then call `setupCheckpointing(JobQueue*, ServiceRegistry&)`. -- Null `JobQueue*` throws `std::logic_error` (programming error, not runtime). -- The checkpointer must be inserted into `CheckpointersCollection` **before** returning from setup, because the WAL hook is armed inside the `WALCheckpointer` constructor and writes can fire it immediately. +- Triggered by `sqlite3_wal_hook` after every WAL write; `static checkpointPageCount = 1000` mirrors SQLite's auto-checkpoint threshold. +- `schedule()` uses `running_` bool under mutex to enforce single in-flight job; if `JobQueue` rejects the job, `running_` is reset. +- Enqueued lambda captures `std::weak_ptr`; destroyed `DatabaseCon` causes the job to exit without touching the session. +- `checkpoint()` calls `sqlite3_wal_checkpoint_v2` with `SQLITE_CHECKPOINT_PASSIVE`. `SQLITE_LOCKED` logged at trace (expected under reader contention); other errors logged as warnings. `running_` reset under mutex after each attempt. +- Net effect: routes checkpoint work off the writer thread onto `jtWAL`. Without this, SQLite does it synchronously on whichever thread crosses the page threshold. -## SOCI Adapter Notes +## SOCI Adapter Notes (`SociDB.cpp`) -- `getConnection(session&)` (`SociDB.cpp`) recovers the raw `sqlite3*` via `dynamic_cast`. This is the only intentional break in the SOCI abstraction; needed for WAL hooks and `sqlite3_db_status`. +- `DBConfig` is two-phase: parse params, open later. `detail::getSociInit` and `detail::getSociSqliteInit` resolve backend + path; the `.sqlite` vs `.db` extension fork lives in `getSociInit`. `getSociSqliteInit` throws `std::runtime_error` if the database name is empty. +- Two free-function `open()` overloads: config-based (delegates through `DBConfig`) and explicit-string (enforces same "sqlite only" constraint). Both paths call `s.open(soci::sqlite3, connectionString)`. +- `getConnection(session&)` recovers the raw `sqlite3*` via `dynamic_cast` — the only intentional break in the SOCI abstraction. Throws `std::logic_error` if the cast fails. Required for WAL hooks and `sqlite3_db_status`. - `getKBUsedAll()` → `sqlite3_memory_used()` (process-global). `getKBUsedDB()` → `SQLITE_DBSTATUS_CACHE_USED` (per-connection). -- Four `convert()` overloads bridge `soci::blob` and `std::vector` / `std::string`. Empty blobs require `blob.trim(0)` rather than `blob.write(nullptr, 0)`. +- Four `convert()` overloads bridge `soci::blob` ↔ `std::vector` / `std::string`. Empty blobs require `blob.trim(0)` rather than `blob.write(nullptr, 0)`. - `SociDB.cpp` opens with `#pragma clang diagnostic ignored "-Wdeprecated"` because SOCI headers use deprecated constructs; scoped to this TU only. -- `DBConfig` is two-phase: parse params, open later. `detail::getSociInit` and `detail::getSociSqliteInit` resolve backend + path; the `.sqlite` vs `.db` extension fork lives in `getSociInit`. + +## Common Bug Patterns + +- No schema migration system; `CREATE TABLE IF NOT EXISTS` silently preserves old schemas with missing columns. New columns on existing deployments require manual `ALTER TABLE` or explicit documentation that the column may be absent. +- `page_size` must be power of 2 between 512–65536; invalid values cause init failure. +- Online deletion coordinates between NodeStore rotation and SQL table pruning; race conditions here lose history. +- Empty database name passed to `detail::getSociSqliteInit` throws — no silent fallback. +- A `WALCheckpointer` registered with `sqlite3_wal_hook` can outlive its `DatabaseCon` if a checkpoint job is in flight; teardown must wait for the job to drain (see Destructor sequence above). +- Opening a new `DatabaseCon` to the same file immediately after destroying the old one can fail if the destructor busy-poll is skipped or shortened — the old checkpoint job may still hold the WAL lock. ## Key Patterns ### Schema Evolution Caveat ```cpp -// WARNING: no migration system — old databases keep old schemas -// CREATE TABLE IF NOT EXISTS silently skips if table exists with old columns -// New columns on existing tables require manual ALTER TABLE or -// documentation that the column is optional and may be absent +// No migration system — old databases keep old schemas. +// CREATE TABLE IF NOT EXISTS silently skips if table exists with old columns. +// New columns require manual ALTER TABLE or must be treated as optional/absent. +// PeerFinder is the exception: it has a SchemaVersion table. ``` ### Disk Space Guard ```cpp -// REQUIRED on write paths: < 512MB triggers fatal to prevent corruption +// Required on all write paths. if (freeDiskSpace < minDiskFree) Throw("Not enough disk space for database write"); ``` ### WAL Hook Cookie ```cpp -// Always pass an integer ID, never `this`. The DatabaseCon may be -// destroyed while a hook invocation is mid-flight on a writer thread. +// Always pass an integer ID, never `this`. +// DatabaseCon may be destroyed while a hook invocation is mid-flight on a writer thread. sqlite3_wal_hook(conn, &walHookCallback, reinterpret_cast(checkpointer->id())); ``` +### Penetrating the SOCI Abstraction +```cpp +// getConnection() is the only intentional SOCI abstraction break. +// Required for sqlite3_wal_hook and sqlite3_db_status APIs. +auto* be = dynamic_cast(s.get_backend()); +if (!be || !be->conn_) throw std::logic_error("Not a sqlite3 session"); +sqlite3* conn = be->conn_; +``` + ## Key Files -- `src/xrpld/app/rdb/backend/detail/SQLiteDatabase.cpp` — main implementation -- `src/xrpld/app/main/DBInit.h` — schema definitions -- `src/xrpld/core/detail/DatabaseCon.cpp` — kept for historical reference; lifecycle now in `libxrpl` -- `src/libxrpl/rdb/DatabaseCon.cpp` — connection lifecycle, `CheckpointersCollection`, destructor drain -- `src/libxrpl/rdb/SociDB.cpp` — SOCI/SQLite adapter, `WALCheckpointer`, blob conversion, memory stats -- `src/xrpld/app/rdb/backend/detail/Node.cpp` — ledger/tx operations -- `src/xrpld/app/rdb/detail/State.cpp` — deletion state tracking +| File | Purpose | +|------|---------| +| `src/libxrpl/rdb/SociDB.cpp` | SOCI/SQLite adapter, `WALCheckpointer`, blob conversion, memory stats | +| `src/libxrpl/rdb/DatabaseCon.cpp` | Connection lifecycle, `CheckpointersCollection`, destructor drain | +| `src/xrpld/app/main/DBInit.h` | Schema definitions (CREATE TABLE statements) | +| `src/xrpld/app/rdb/backend/detail/SQLiteDatabase.cpp` | Main `SQLiteDatabase` implementation | +| `src/xrpld/app/rdb/backend/detail/Node.cpp` | Ledger/tx read-write operations | +| `src/xrpld/app/rdb/detail/State.cpp` | Deletion state tracking | +| `src/xrpld/core/detail/DatabaseCon.cpp` | Legacy reference; lifecycle now in `libxrpl` | diff --git a/docs/skills/transactors.md b/docs/skills/transactors.md index 9ed34024b4..c8ddd9e340 100644 --- a/docs/skills/transactors.md +++ b/docs/skills/transactors.md @@ -1,39 +1,57 @@ # Transactors -Transaction processing pipeline: preflight (static validation) -> preclaim (ledger state checks) -> doApply (state mutation). Base class `Transactor` in `src/libxrpl/tx/`. +Transaction processing pipeline: preflight (static validation) → preclaim (ledger state checks) → doApply (state mutation). Base class `Transactor` in `src/libxrpl/tx/`. Every transaction type inherits from it; only `doApply()` is virtual — all other dispatch is compile-time. ## Pipeline Architecture ### Three Phases -1. **`preflight`** — stateless, no ledger access. Validates fields, flags, signatures (cached via HashRouter). Cheap, parallelizable. Returns `NotTEC`. Caller-cacheable. -2. **`preclaim`** — read-only `ReadView` access. Checks account exists, fee sufficient, sequence valid. Returns `TER`. Sets `likelyToClaimFee` for relay decisions. -3. **`doApply`** — mutable `ApplyView` access. Only runs if `preclaim` returned `tesSUCCESS` and `likelyToClaimFee` is true. +1. **`preflight`** — stateless, no ledger access. Validates fields, flags, signatures (cached via HashRouter). Cheap, parallelizable. Returns `NotTEC`. Results carry a `TxConsequences` summary used by the transaction queue. +2. **`preclaim`** — read-only `ReadView` access. Checks account exists, fee sufficient, sequence valid, signature valid. Returns `TER`. Sets `likelyToClaimFee` for relay decisions. +3. **`doApply`** — mutable `ApplyView` access. Only runs if preclaim returned `tesSUCCESS` and `likelyToClaimFee` is true. -`apply()` in `apply.cpp` composes all three; templated on a preflight callable so the same `preclaim`+`doApply` machinery serves normal and batch-inner transactions. `applyTransaction()` adds `tapRETRY` semantics and dispatches to `applyBatchTransactions()` after a successful `ttBATCH`. +`apply()` in `apply.cpp` composes all three. It accepts a preflight callable so the same `preclaim`+`doApply` machinery serves normal and batch-inner transactions. `applyTransaction()` adds `tapRETRY` semantics and dispatches to `applyBatchTransactions()` after a successful `ttBATCH`. + +**Important preclaim security invariant** (documented in `applySteps.cpp`): every check through and including `checkSign` must return `NotTEC` (not a `tec` code). A `tec` before signature verification would charge a fee without authentication — a critical security property. + +### Layered Preflight: `preflight0` → `preflight1` → `T::preflight` → `preflight2` → `T::preflightSigValidated` + +`Transactor::invokePreflight` calls (in order): `T::checkExtraFeatures`, `preflight1(ctx, T::getFlagsMask(ctx))`, `T::preflight`, `preflight2`, `T::preflightSigValidated`. Each is a static method; derived classes participate via name hiding — never virtual. + +- **`preflight0`** (called from `preflight1`): gates on `sfNetworkID` presence/absence, zero-hash tx ID, valid flag bits, and pseudo-tx/batch-inner exclusivity. +- **`preflight1`**: account is non-zero, `sfFee` is non-negative native XRP, signing key format valid, tickets and `sfAccountTxnID` are mutually exclusive. +- **`preflight2`**: simulation mode (`tapDRY_RUN`), cryptographic signature check via hash router. Skipped entirely for `tfInnerBatchTxn` (outer batch authorizes). + +**Rule**: derived `preflight` runs *between* `preflight1` and `preflight2`. Never call `preflight1`/`preflight2` directly. ### Compile-time Polymorphism (Name Hiding, Not Virtual) -`Transactor::invokePreflight` calls `T::checkExtraFeatures`, `preflight1`, `T::getFlagsMask`, `T::preflight`, `preflight2`, `T::preflightSigValidated` by name. The static methods on derived classes participate via name hiding — derived classes MUST NOT define `invokePreflight` themselves, nor call `preflight1`/`preflight2` directly. Only `doApply()` is virtual. - -`with_txn_type()` in `applySteps.cpp` uses an X-macro over `transactions.macro` to convert runtime `TxType` to a compile-time template parameter via a switch dispatch — no virtual dispatch, no included transactor headers (forbidden in `applySteps.cpp`). +`with_txn_type()` in `applySteps.cpp` uses an X-macro over `transactions.macro` to convert runtime `TxType` to a compile-time template parameter via a switch dispatch — no virtual dispatch, no transactor headers included in `applySteps.cpp` (explicitly forbidden). ### `ConsequencesFactoryType` Each transactor declares `static constexpr ConsequencesFactoryType ConsequencesFactory{...}`: - **`Normal`** — standard fee/sequence consequences. Most transactors. -- **`Blocker`** — queues block later transactions from same account. Examples: `SetRegularKey`, `AccountDelete`, `SignerListSet`, `XChainAddClaimAttestation`, `XChainClaim` (any attestation may trigger immediate fund movement). -- **`Custom`** — derived class implements `makeTxConsequences(PreflightContext const&)`. Examples: `Payment` (XRP spend), `OfferCreate` (XRP TakerGets), `XChainCommit`, `TicketCreate` (multi-sequence), `AccountSet` (conditional blocker on auth/master flags), `LoanSet` (counterparty signers). +- **`Blocker`** — queues block later transactions from same account. Examples: `SetRegularKey`, `AccountDelete`, `SignerListSet`, `XChainAddClaimAttestation`. +- **`Custom`** — derived class implements `makeTxConsequences(PreflightContext const&)`. Examples: `Payment` (XRP spend via `sfSendMax`), `OfferCreate` (XRP TakerGets), `TicketCreate` (multi-sequence), `AccountSet` (conditional blocker on auth/master flags), `LoanSet` (counterparty signers). -The `consequences_helper` dispatch in `applySteps.cpp` uses C++20 `requires` clauses to pick the right factory at compile time. +C++20 `requires` clauses in `applySteps.cpp` pick the factory at compile time. + +### Numeric Precision Guards + +`with_txn_type()` installs RAII guards before dispatch: +- When `featureSingleAssetVault` or `featureLendingProtocol` is active: `CurrentTransactionRulesGuard` (thread-local rules access) + `NumberSO` (floating-point-style number arithmetic per `fixUniversalNumber`). +- Otherwise: `NumberMantissaScaleGuard` (legacy small-mantissa mode). + +Ideally these would apply everywhere from the start; they were retrofitted into `with_txn_type` for `preflight`/`preclaim` when vault/lending features needed correct numeric rules in read-only phases. ## Key Invariants -- Pipeline is strict: preflight runs WITHOUT ledger state, preclaim runs WITH read-only view, doApply runs with mutable view -- `preflight` validates all fields exist and are well-formed; this is the ONLY place to reject malformed transactions cheaply -- Fee is always deducted on `tecCLAIM`; `payFee` runs before `doApply` -- Sequence/ticket consumption happens in `consumeSeqProxy`; must succeed before any state changes -- Invariant checkers run after `doApply`; they can veto the transaction post-execution +- Pipeline is strict: preflight runs WITHOUT ledger state, preclaim WITH read-only view, doApply with mutable view. +- `preflight` validates all fields exist and are well-formed; this is the ONLY place to reject malformed transactions cheaply. +- Fee is always deducted on `tecCLAIM`; `payFee` runs before `doApply`. +- Sequence/ticket consumption happens in `consumeSeqProxy`; must succeed before any state changes. +- Invariant checkers run after `doApply`; they can veto the transaction post-execution. - Amendment gating belongs in `checkExtraFeatures`, NOT in `preflight`. The framework guards on the central permission registry first. - `tem*`/`tef*`/`tel*` results: fee NOT charged, transaction not included. `tec*` results: fee charged, transaction included. @@ -41,7 +59,7 @@ The `consequences_helper` dispatch in `applySteps.cpp` uses C++20 `requires` cla **`doApply` mutations are NOT committed until `ctx_.apply()` is called at the end of `operator()`.** All peek/insert/update/erase during `doApply` go into an `ApplyContext` view (`view_`) layered on top of `base_`. Whether that view gets flushed to `base_` depends entirely on the TER that `doApply` returns. -`ApplyContext::discard()` ([src/libxrpl/tx/ApplyContext.cpp](src/libxrpl/tx/ApplyContext.cpp)) replaces `view_` with a fresh view on `base_` — **every doApply mutation is thrown away**: +`ApplyContext::discard()` replaces `view_` with a fresh view on `base_` — **every doApply mutation is thrown away**: ```cpp void ApplyContext::discard() { view_.emplace(&base_, flags_); } ``` @@ -61,24 +79,31 @@ void ApplyContext::discard() { view_.emplace(&base_, flags_); } ### What this means -- **A `tec*` return from doApply acts as a full-transaction rollback.** You do NOT need to order mutations defensively. If a helper called late in doApply returns `tec*`, everything mutated earlier in the same doApply is discarded via `discard()`. -- **Orphan-state bugs of the form "we mutated X then returned tec* so X is now in an inconsistent state" are not possible at the transactor boundary.** -- **The real failure mode is within `doApply` itself**: stale SLE pointers, missing `view().update(sle)` after mutation, mutating values read by value instead of peek. These are in-memory bugs, not state-commit bugs. -- **Sandboxes inside `doApply` add nesting, not safety.** `PaymentSandbox` / nested `ApplyView` are useful when you need to conditionally commit a subset of changes *within* a single doApply (e.g., apply offers but revert if the net outcome fails). They are not needed to protect against doApply's own `tec*` return. +- **A `tec*` return from doApply acts as a full-transaction rollback.** You do NOT need to order mutations defensively. If a helper called late in doApply returns `tec*`, everything mutated earlier is discarded. +- **Orphan-state bugs "we mutated X then returned tec* so X is now inconsistent" are not possible at the transactor boundary.** +- **The real failure mode**: stale SLE pointers, missing `view().update(sle)` after mutation, mutating values read by value. These are in-memory bugs, not state-commit bugs. +- **Sandboxes inside `doApply` add nesting, not safety.** `PaymentSandbox`/nested `ApplyView` are for conditionally committing a *subset* of changes within a single doApply. - **Only `ctx_.apply(result)` publishes to `base_`**; a doApply that returns early, throws, or crashes never reaches that call. +### `reset()` Fee Clamping + +`reset()` discards all ledger mutations via `ctx_.discard()` then re-deducts the fee, clamping if necessary: +```cpp +if (fee > balance) fee = balance; +``` +This ensures a failing transaction can still claim its fee even when the account is over-committed. + ### Verifying a suspected orphan-state bug -Before claiming "directory removed but SLE not erased because tec\*": 1. Read the caller of `doApply` — confirm the TER path (`operator()` in Transactor.cpp). 2. Check whether `discard()` is reached via `reset()` or the `tapFAIL_HARD` branch. -3. If both paths call `discard()`, the mutations cannot persist on tec\*. -4. Look instead for: missing `view().update(sle)` after mutation, stale SLE pointers, or genuine non-atomic side effects (e.g., hash router flags, which are NOT in the ApplyContext view). +3. If both paths call `discard()`, the mutations cannot persist on tec*. +4. Look instead for: missing `view().update(sle)`, stale SLE pointers, or genuine non-atomic side effects (e.g., hash router flags — NOT in ApplyContext view). ## Apply Loop Details (Transactor::operator()()) 1. RAII guards: `NumberSO`, `CurrentTransactionRulesGuard` (for `fixUniversalNumber`, `featureSingleAssetVault`, `featureLendingProtocol`) -2. Debug builds: serialize/re-parse round-trip catches serdes mismatches +2. Debug builds: serialize/re-parse round-trip catches serdes mismatches; `trapTransaction()` provides a named breakpoint for replaying specific transactions 3. `apply()` runs `preCompute()` → captures `preFeeBalance_` → `consumeSeqProxy()` → `payFee()` → updates `sfAccountTxnID` → `doApply()` 4. Enforces `tecOVERSIZE` if metadata exceeds `oversizeMetaDataCap` 5. Special `tec` codes (`tecOVERSIZE`, `tecKILLED`, `tecINCOMPLETE`, `tecEXPIRED`) trigger context-diff visitation then targeted cleanup: `removeUnfundedOffers`, `removeExpiredNFTokenOffers`, `removeDeletedTrustLines`, `removeDeletedMPTs`, `removeExpiredCredentials` @@ -96,6 +121,8 @@ Before claiming "directory removed but SLE not erased because tec\*": - Using `view().update()` on a stale SLE pointer after another mutation - Computing reserve against `view().peek(account)->getFieldAmount(sfBalance)` AFTER fee deduction instead of `preFeeBalance_` - Missing `associateAsset(*sle, asset)` call at end of `doApply` for SLEs with `STNumber` or `STTakesAsset` fields (lending/vault transactors) +- preclaim `Rules` race: if ledger advanced between preflight and preclaim, `applySteps.cpp` silently re-runs preflight with new rules before constructing `PreclaimContext` +- Calling `ter*` codes before signature verification in preclaim (see security invariant above) ## Transactor Template @@ -189,6 +216,7 @@ TER doApply() override { Variants: - `PaymentSandbox` — for `flow()` calls (used by `Payment`, `CheckCash`, `OfferCreate` crossing). Required because `flow()` uses deferred-credit accounting. +- `RippleCalc::rippleCalculate()` wraps its own `PaymentSandbox` inside the caller's sandbox (double-sandbox pattern) for exception safety — if `flow()` throws, the caller's sandbox remains unmodified. - Dual sandbox in `OfferCreate`: `sb` (main) + `sbCancel` (offer cleanup); commit one or the other based on `tfFillOrKill` outcome. - Nested sandboxes: `applyBatchTransactions` uses `wholeBatchView` (over outer view) + `perTxBatchView` (per inner tx). @@ -218,7 +246,7 @@ Deleting an owned object: Reserve cost is usually 1 unit per object, but: - `AccountDelete`, `LedgerStateFix`, `AMMCreate` charge a full reserve via `calculateOwnerReserveFee` instead of base fee -- Two-object structures (`Vault`, `LoanBroker`) charge +2 for object + pseudo-account +- Two-object structures (`Vault`, `LoanBroker`) charge +2 for object + pseudo-account (incremented before reserve check so check reflects true post-creation state) - `SignerListSet` post-amendment uses `lsfOneOwnerCount` flag (1 unit regardless of N signers); pre-amendment charges 2+N - `OracleSet` uses tiered count: 1 unit for ≤5 price pairs, 2 units for more @@ -234,9 +262,11 @@ Synthetic `AccountRoot` SLEs with disabled master key, used to hold protocol-man Pseudo-account guard rules: - `ValidPseudoAccounts` invariant: exactly one discriminator field, sequence never changes, required flags (`lsfDisableMaster | lsfDefaultRipple | lsfDepositAuth`), no `sfRegularKey` +- For pseudo-accounts, initial sequence must be 0 and flags must be exactly `lsfDisableMaster | lsfDefaultRipple | lsfDepositAuth` - Many transactors explicitly reject pseudo-account destinations (`tecPSEUDO_ACCOUNT`): `Payment` direct, `CheckCreate`, `PaymentChannelCreate`, `VaultCreate` (asset issuer), `Clawback` (holder) - `MPTokenAuthorize` issuer-path skips pseudo-account holders (they are implicitly always authorized) - Pseudo-accounts cannot sign — when `featureLendingProtocol` active, `checkSign` rejects with `tefBAD_AUTH` +- Anti-nesting: AMM preclaim detects LP-token-issuer pseudo-accounts via `sfAMMID` on the issuer's `AccountRoot` and rejects using them as AMM assets ### `associateAsset` Convention @@ -251,25 +281,29 @@ Validates the optional `sfDelegate` field. If absent, normal account signing app 2. Try full transaction-type permission via `checkTxPermission()` (uses `TxType + 1` encoding) 3. Fall back to granular permission via `loadGranularPermission()` + per-transactor logic -### Granular Permissions +### Encoding Convention -Permission values store both forms in single `uint32_t`: -- Transaction types: `TxType + 1` (always ≤ `UINT16_MAX`, since `+1` shift avoids ambiguous zero) +Permission values store both forms in a single `uint32_t`: +- Transaction types: `TxType + 1` (always ≤ `UINT16_MAX`; `+1` avoids ambiguous zero since `ttPAYMENT == 0`) - Granular permissions: values `> UINT16_MAX`, enumerated in `permissions.macro` +`Permission` singleton asserts this separation at construction time. + +### Granular Permissions + `DelegateUtils.cpp` provides: -- `checkTxPermission()` — linear scan for `TxType + 1` match -- `loadGranularPermission()` — populates per-tx-type granular set via `Permission::getInstance().getGranularTxType()` reverse-map +- `checkTxPermission()` — linear scan for `TxType + 1` match; returns `terNO_DELEGATE_PERMISSION` on null delegate or no match +- `loadGranularPermission()` — populates per-tx-type granular set via `Permission::getInstance().getGranularTxType()` reverse-map; returns silently with empty set on null delegate Examples of granular permissions: -- `Payment` direct only: `PaymentMint` (issuer source), `PaymentBurn` (issuer destination) — blocked if `sfPaths` or asset conversion +- `Payment` direct only: `PaymentMint` (issuer source), `PaymentBurn` (issuer destination) — blocked if `sfPaths` present or asset conversion - `AccountSet`: field-level grants per metadata field (`AccountDomainSet`, `AccountTransferRateSet`, etc.); flag changes blocked entirely - `TrustSet`: `TrustlineAuthorize`, `TrustlineFreeze`, `TrustlineUnfreeze`; cannot create new lines, cannot change limit - `MPTokenIssuanceSet`: `MPTokenIssuanceLock`, `MPTokenIssuanceUnlock` ## Permission Model & Cross-Transactor Static Interfaces -Several transactors expose `static deleteSLE`/`removeFromLedger`/equivalent methods on `ApplyView` so other transactors (especially `AccountDelete`) can clean up owned objects without constructing a fake transaction: +Several transactors expose static deletion/creation methods on `ApplyView` so other transactors (especially `AccountDelete`) can clean up owned objects without constructing a fake transaction: - `DepositPreauth::removeFromLedger(ApplyView&, uint256, Journal)` - `DIDDelete::deleteSLE(ApplyView&, SLE, AccountID, Journal)` @@ -280,7 +314,13 @@ Several transactors expose `static deleteSLE`/`removeFromLedger`/equivalent meth - `AMMWithdraw::withdraw`/`equalWithdrawTokens` — used by `AMMClawback`, `AMMDelete` - `LoanManage::unimpairLoan/impairLoan/defaultLoan` — used by `LoanPay` -`AccountDelete` uses a `nonObligationDeleter()` switch over `LedgerEntryType` returning a `DeleterFuncPtr`. `nullptr` means "obligation, cannot delete". Recognized as non-obligations: offers, signer lists, tickets, deposit preauth, NFT offers, DIDs, oracles, credentials, delegates. +`AccountDelete` uses a `nonObligationDeleter()` switch over `LedgerEntryType` returning a `DeleterFuncPtr`. `nullptr` means "obligation, cannot delete". The same switch is used in both `preclaim` (to detect blockers) and `doApply` (to invoke deletions), keeping type classification in sync. Deletable types: offers, signer lists, tickets, deposit preauth, NFT offers, DIDs, oracles, credentials, delegates. + +### AccountDelete-specific preclaim rules + +- NFT obligations: `sfMintedNFTokens != sfBurnedNFTokens` → `tecHAS_OBLIGATIONS`; authorized-minting replay guard: `FirstNFTokenSequence + MintedNFTokens + 255` must not exceed current ledger sequence +- Sequence freshness: account seq must be ≥ 256 below current ledger index (`tecTOO_SOON`) — prevents replay after resurrection +- Owner directory: more than `maxDeletableDirEntries` (1000) deletable items → `tefTOO_BIG` ## Signature Verification @@ -314,10 +354,14 @@ After every successful or fee-claiming transaction, every checker in the `Invari Uses `std::index_sequence` + fold expression for variadic visit. Critically, `finalize()` results are collected into a `std::array` then checked with `std::all_of` — NOT short-circuited with `&&` — so every failing invariant logs its own diagnostic. +Invariant checkers run even on failed (`tec*`) transactions — bugs or exploits could cause a failed transaction to mutate ledger state unexpectedly. + ### `failInvariantCheck` Escalation - First failure → `tecINVARIANT_FAILED` (committed to ledger, fee charged) -- Repeated failure during retry → `tefINVARIANT_FAILED` (not included in ledger) +- Repeated failure during retry (recognized because incoming result is already `tecINVARIANT_FAILED` or `tefINVARIANT_FAILED`) → `tefINVARIANT_FAILED` (not included in ledger) + +Rationale: if even the minimal fee-charge path breaks invariants, no ledger entry of any kind should be created. ### The `enforce` Pattern (Soft Rollout) @@ -334,7 +378,7 @@ This lets invariants ship before activation: violations log unconditionally (vis ### Privilege System (`InvariantCheckPrivilege.h`) -`Privilege` bitmask enum + `hasPrivilege(STTx, Privilege)` (implemented via `transactions.macro` X-macro). Used by checkers to know what each transaction type may legitimately do: +`Privilege` bitmask enum + `hasPrivilege(STTx, Privilege)` (implemented via `transactions.macro` X-macro). Used by checkers to know what each transaction type may legitimately do. `must` vs. `may` variants let invariants enforce cardinality (e.g., `AccountDelete` *must* delete exactly one account root; `AMMWithdraw` *may* delete one). | Privilege | Granted to (examples) | |---|---| @@ -354,16 +398,16 @@ This lets invariants ship before activation: violations log unconditionally (vis | Checker | What it enforces | |---|---| | `TransactionFeeCheck` | Fee non-negative, < INITIAL_XRP, ≤ sfFee | -| `XRPNotCreated` | Net XRP delta across accounts/paychans/escrows = -fee | +| `XRPNotCreated` | Net XRP delta across accounts/paychans/escrows = -fee (pay channels tracked as `sfAmount - sfBalance`) | | `XRPBalanceChecks` | Every account balance is native XRP in [0, INITIAL_XRP] | | `NoBadOffers` | No negative-amount, no XRP-for-XRP offers | -| `NoZeroEscrow` | Escrow/MPT amounts within bounds; MPT locked ≤ outstanding | +| `NoZeroEscrow` | Escrow/MPT amounts within bounds; MPT locked ≤ outstanding; also validates `ltMPTOKEN_ISSUANCE` and `ltMPTOKEN` entries | | `AccountRootsNotDeleted` | Account deletion cardinality matches `must`/`may` privilege | -| `AccountRootsDeletedClean` | Deleted account had zero balance + zero owner count + no orphaned objects (trust lines, escrows, offers, NFT pages, paychans, pseudo-account linked objects) | +| `AccountRootsDeletedClean` | Deleted account had zero balance + zero owner count + no orphaned objects; uses `before` SLE for pseudo-account linked object keys (fields may be cleared during deletion) | | `ValidNewAccountRoot` | New accounts only from `createAcct`/`createPseudoAcct`; correct initial seq + flags | -| `ValidPseudoAccounts` | Exactly one discriminator, sequence unchanged, required flags, no regular key | +| `ValidPseudoAccounts` | Exactly one discriminator, sequence unchanged, required flags, no regular key; errors accumulated in `vector` and all logged before returning | | `ValidClawback` | At most one trust line/MPT modified, holder balance non-negative | -| `NoModifiedUnmodifiableFields` | `sfLedgerEntryType`/`sfLedgerIndex` immutable; loan/broker origination fields immutable | +| `NoModifiedUnmodifiableFields` | `sfLedgerEntryType`/`sfLedgerIndex` immutable; loan/broker origination fields immutable; gated on `featureLendingProtocol` | | `LedgerEntryTypesMatch` | Modified entries don't change type; new entries are recognized types | | `NoXRPTrustLines` | No trust line uses XRP as currency | | `NoDeepFreezeTrustLinesWithoutFreeze` | DeepFreeze flag requires regular Freeze flag | @@ -375,9 +419,9 @@ This lets invariants ship before activation: violations log unconditionally (vis | `ValidAMM` | Per-tx-type rules: create exact `sqrt(A*B)`, deposit/withdraw constant-product invariant `sqrt(x*y) ≥ LPSupply`, vote/bid leave pool unchanged | | `ValidPermissionedDomain` | AcceptedCredentials non-empty, ≤ max size, unique, sorted | | `ValidPermissionedDEX` | Domain-scoped tx only touches offers/dirs with matching domain; hybrid offers structurally valid | -| `ValidVault` | Per-tx-type rules: deposit/withdraw asset/share conservation, immutable fields unchanged, loss only via loan ops | +| `ValidVault` | Per-tx-type rules: deposit/withdraw asset/share conservation, immutable fields unchanged, loss only via loan ops; XRP vault fee compensation for depositor/withdrawer balance check | | `ValidLoan` | Payment completion bidirectional (paymentRemaining=0 ↔ all outstanding=0), `lsfLoanOverpayment` immutable, non-negative fees, positive `sfPeriodicPayment` | -| `ValidLoanBroker` | Sequence monotonic, non-negative cover/debt, vault exists, cover ≤ pseudo-account balance (== under `fixSecurity3_1_3` except at delete) | +| `ValidLoanBroker` | Sequence monotonic, non-negative cover/debt, vault exists, cover ≤ pseudo-account balance (== under `fixSecurity3_1_3` except at delete); no amendment gate (objects can't exist unless amendment is active) | ## doApply Order Convention (Cleanup) @@ -396,7 +440,7 @@ Erasing first would lose the page index needed for `dirRemove`. Many transactors - `tecKILLED`: order/loan time-window expired or sequence overflow (`LoanSet` arithmetic overflow check) - `tecEXPIRED`: legitimately expired object; some transactors (e.g., `NFTokenAcceptOffer` under `fixExpiredNFTokenOfferRemoval`) clean up before returning this - `tecINSUFFICIENT_RESERVE`: reserve check failed against `preFeeBalance_` -- `tecINTERNAL` / `tefBAD_LEDGER`: ledger corruption sentinels. Often marked `LCOV_EXCL` because preclaim should have prevented them +- `tecINTERNAL` / `tefBAD_LEDGER`: ledger corruption sentinels. Often marked `LCOV_EXCL` because preclaim should have prevented them. `RippleCalc` converts exceptions to `tecINTERNAL` rather than rethrowing (deterministic fallback all validators agree on) - `terNO_AMM`, `terNO_DELEGATE_PERMISSION`, `terNO_ACCOUNT`, `terNO_LINE`: retryable failures ## Hash Router Caching @@ -406,16 +450,20 @@ Some expensive operations cache results in the `HashRouter` using private flag b - **Signature verification** (`apply.cpp` `checkValidity`): `SF_SIGBAD`, `SF_SIGGOOD`, `SF_LOCALBAD`, `SF_LOCALGOOD` (PRIVATE1–PRIVATE4) - **Crypto-condition validation** (`EscrowFinish::preflightSigValidated`): `SF_CF_VALID`, `SF_CF_INVALID` (PRIVATE5–PRIVATE6) -The `forceValidity()` API can promote cached state but cannot downgrade (never sets `SF_SIGBAD`) — used to mark locally-submitted transactions as pre-verified. +The `forceValidity()` API can promote cached state (using `[[fallthrough]]`) but cannot downgrade (never sets `SF_SIGBAD`) — used to mark locally-submitted transactions as pre-verified. **Use with extreme care**: bypassing signature verification in the cache affects every subsequent `checkValidity` call on the same hash until cache expiry. + +Validity enum → P2P semantics: `SigBad` = don't forward; `SigGoodOnly` = relay but don't apply; `Valid` = relay and apply. ## Batch Transactions `Batch` (in `system/Batch.cpp`) bundles 2-8 inner transactions with one of four execution policies (mutually exclusive, enforced via `std::popcount`): -- `tfAllOrNothing`: any failure aborts, full rollback -- `tfUntilFailure`: stop at first failure, keep prior successes +- `tfAllOrNothing`: any failure aborts, full rollback (`applyBatchTransactions` returns false) +- `tfUntilFailure`: stop at first failure, keep prior successes (returns false if no inner tx was ever applied) - `tfOnlyOne`: stop at first success - `tfIndependent`: run all, commit successes +`Batch::doApply()` returns `tesSUCCESS` and does nothing — `applyBatchTransactions()` in `apply.cpp` is called separately by `applyTransaction()` after the outer apply succeeds, executing inner txs in a nested `wholeBatchView`/`perTxBatchView` sandbox structure. + **Critical for new transactors:** Update `disabledTxTypes` in `Batch.cpp` if your type cannot run inside a batch. Currently disabled: all `ttVAULT_*` and `ttLOAN_*` types (multi-step state machines whose invariants are difficult to reason about under batch atomicity). Inner transaction rules (enforced in `Batch::preflight`): @@ -423,14 +471,17 @@ Inner transaction rules (enforced in `Batch::preflight`): - Empty `sfSigningPubKey`, no `sfTxnSignature`, no `sfSigners` - Fee = 0 XRP - Exactly one of `sfSequence` (nonzero) or `sfTicketSequence` -- For `tfAllOrNothing`/`tfUntilFailure`: no duplicate sequence/ticket values across same-account inner txs +- For `tfAllOrNothing`/`tfUntilFailure`: no duplicate sequence/ticket values across same-account inner txs (relaxed for `tfIndependent`/`tfOnlyOne`) +- Each inner tx has `xrpl::preflight` called on it with `tapBATCH` and `parentBatchId`; no nested `ttBATCH` -`Batch::preflightSigValidated` reconciles `sfBatchSigners` against the set of inner-tx accounts that differ from outer account (plus any `sfCounterparty` accounts). The outer account is explicitly excluded from `sfBatchSigners`. - -`Batch::doApply()` returns `tesSUCCESS` and does nothing — `applyBatchTransactions()` in `apply.cpp` is called separately by `applyTransaction()` after the outer apply succeeds, executing inner txs in a nested `wholeBatchView`/`perTxBatchView` sandbox structure. +`Batch::preflightSigValidated` reconciles `sfBatchSigners` bidirectionally: each signer removed from `requiredSigners` as matched; any signer not in `requiredSigners` → `temBAD_SIGNER`; outer account explicitly excluded. Then `ctx.tx.checkBatchSign(ctx.rules)` verifies the cryptographic batch signature payload. `Batch::calculateBaseFee` = `baseFee + Σ(inner tx fees) + numSigners × baseFee`. Overflow guards everywhere (marked `LCOV_EXCL`). +**`fixBatchInnerSigs` amendment**: corrects a bug in the original Batch implementation where inner-batch transactions could be assigned `SF_SIGGOOD` cache entries (implying valid signatures on unsigned objects). After the fix, inner-batch transactions follow the `neverValid` path. + +All Batch log messages use `BatchTrace[]` prefix for correlation. + ## Key Files - `src/libxrpl/tx/Transactor.cpp` - base class, three-phase pipeline, fee calculation, signature dispatch @@ -448,12 +499,21 @@ Inner transaction rules (enforced in `Batch::preflight`): `Payment`, `OfferCreate` (crossing), and `CheckCash` (IOU/MPT) all route through `flow()` in `Flow.cpp` → `StrandFlow.h`. Key concepts: - A **strand** is a `std::vector>`; each `Step` is one hop (`DirectStepI`, `BookStepXX`, `XRPEndpointStep`, `MPTEndpointStep`) +- `flow()` is templated on `(TIn, TOut)` pairs for the three asset types (6 combinations). `Flow.cpp` is the façade that resolves runtime `STAmount`/`Asset` values into compile-time template parameters via `std::visit`, then hands off to `StrandFlow.h`. - Two-pass execution: reverse pass (compute required input for desired output) then forward pass (compute output for actual input) - Limiting step detection: if reverse pass cannot satisfy desired output, that step is identified as the bottleneck and used as the anchor for forward pass - Multi-strand flow uses `ActiveStrands` priority queue sorted by `qualityUpperBound`; one strand consumed per outer iteration (probe-and-push) - Safety limits: `MaxOffersToConsume` = 1000 per book step, `maxTries` = 1000 outer iterations, `maxOffersToConsider` = 1500 cumulative, `AMMContext::MaxIterations` = 30 - `PaymentSandbox` (not regular `Sandbox`) is required because `flow()` uses deferred-credit accounting -- AMM offers are synthesized by `AMMLiquidity` to look like CLOB offers to `BookStep`; single-path uses `changeSpotPriceQuality`, multi-path uses Fibonacci-scaled offer sizes +- AMM offers are synthesized by `AMMLiquidity` to look like CLOB offers to `BookStep`; single-path uses `changeSpotPriceQuality`, multi-path uses Fibonacci-scaled offer sizes; `AMMContext` tracks whether multi-path is active (disables quality optimization) +- `RippleCalc::rippleCalculate()` creates a nested `PaymentSandbox` inside the caller's `PaymentSandbox` (exception safety); `flow()` applies its internal sandbox to `flowSB` only on success +- `ter*` retry codes from `RippleCalc` are converted to `tecPATH_DRY` in `Payment::doApply` (forces fee charge, prevents path-spam) +- `sfDeliverMin` + `tfPartialPayment`: if actual delivery < `sfDeliverMin` → `tecPATH_PARTIAL`; `ctx_.deliver()` records actual delivered amount for metadata (critical for partial payment detection downstream) +- `std::optional domainID` threads through `toStrands()` for permissioned payment domains + +### `sendMax` semantics in `RippleCalc` + +`sendMax` is `nullopt` when sending the same IOU that the destination receives with sender as issuer (no separate spending cap needed). Otherwise set to `saMaxAmountReq`. `limitQuality` is only constructed when `pInputs->limitQuality && saMaxAmountReq > beast::zero`. ## Asset Type Dispatch Pattern @@ -473,3 +533,28 @@ template <> TER preclaimHelper(...); ``` Used by `Clawback`, `Escrow*`, `Vault*`, `AMM*Withdraw/Deposit`, `LoanBrokerCoverClawback`. Each specialization handles asset-type-specific permission flags (`lsfAllowTrustLineClawback`/`lsfNoFreeze` vs `lsfMPTCanClawback`), authorization (`StrongAuth` vs `WeakAuth`), and freeze checks (`tecFROZEN` vs `tecLOCKED`). + +## Lending Protocol (XLS-66) + +`LendingHelpers.cpp` is the numerical core. Key concepts: + +- **Amortization math**: `loanPeriodicPayment()` uses standard `r(1+r)^n / ((1+r)^n - 1)` factor (Eq. 6–7 in XLS-66 Eq. Glossary). Zero-interest path uses equal principal slices (no division by zero). +- **Theoretical vs. rounded state**: `LoanProperties` holds both; `computeTheoreticalLoanState()` computes at full precision; `constructRoundedLoanState()` reflects actual ledger values. Rounding errors are carried forward during re-amortization. +- **Payment types**: regular, late (penalty via `loanLatePaymentInterest`), full/early-closure, overpayment (triggers re-amortization via `tryOverpayment()`). +- **`checkLoanGuards()`** enforces 4 precision invariants at creation/re-amortization: measurable interest, positive first-payment principal, non-zero rounded payment, payment count math. All return `tecPRECISION_LOSS`. +- **Template proxy pattern**: `doPayment` runs against real SLE (via `ValueProxy`) or simulation values — same code path for both. +- `loanMakePayment()` dispatches to the correct payment type and runs up to `loanMaximumPaymentsPerTransaction` installments per call. + +## Vault Architecture + +Six vault transactors: `VaultCreate`, `VaultDeposit`, `VaultWithdraw`, `VaultSet`, `VaultDelete`, `VaultClawback`. Key creation invariants (see `VaultCreate.cpp`): + +- `sfWithdrawalPolicy` currently only accepts `vaultStrategyFirstComeFirstServe` (= 1) +- `sfDomainID` is only valid when `tfVaultPrivate` is set +- `sfScale` restricted: meaningless for XRP/MPT assets; bounded above by `vaultMaximumIOUScale` (18) +- Vault pseudo-account asset issuer cannot be another pseudo-account (`tecWRONG_ASSET`) — those assets have no clawback path +- `adjustOwnerCount` increments by **2** (vault + pseudo-account) before reserve check +- MPT share issuance flags derived from transaction flags: tradeable unless `tfVaultShareNonTransferable`; `lsfMPTRequireAuth` added for private vaults +- `associateAsset` is the final call in `doApply` + +`ValidVault` invariant uses a delta-map (`uint256 → Number`) with sign conventions per entry type (+1 for share issuance outstanding amount, -1 for asset balances). Entries captured even at zero delta for accounting completeness. Fee compensation applied for XRP vault balance deltas (skipped for delegated transactions).