Rule D (dashboard PromQL labels must exist in L1) flagged `__name__` once the
phase-7 system-*.json dashboards started using `sum by (le, __name__)`.
`__name__` is the Prometheus reserved label for the metric name itself — a
builtin, not a span attribute. Add it to the builtin allowlist and cover it
with a test. (Earlier dashboards only used `__name__` inside `{__name__=~...}`
matchers, which the label regex did not extract, so this surfaced only now.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring phase-2 forward into phase 3 (transaction tracing). Phase 3 introduces
TxSpanNames.h, TxQSpanNames.h, and TxApplySpanNames.h.
Conflict resolution:
- TxQ.cpp: kept phase-3's txq_span-based instrumentation (phase-2 had none).
Dropped the orphaned `NumberSO{... fixUniversalNumber}` line — develop's
#5962 (Retire fixUniversalNumber) removed that symbol repo-wide; the
conflict block had carried one stale copy that would not compile.
- 05/08/OpenTelemetryPlan.md: dropped the deleted 04-code-samples / POC_taskList
references (carried from phase-2), kept phase-3's new secure-OTel.md doc rows,
section, and Mermaid node/edge/style. Config code block -> prose; merged the
secure-OTel hardening pointer with the authoritative-config prose.
- Phase3_taskList.md: removed the "dotted keys for readability" note that came
from phase-2 — phase 3 already uses the underscore keys.
Reviewed by code-review agents: telemetry instrumentation intact, naming check
green (47 keys across 7 *SpanNames.h headers), no conflict markers.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring the hardened OTel naming check forward from phase-1c: unconditional
Rule F, test-file exemption, and the Rule H in-place-constant warning. The
check passes clean on phase 2 (24 keys across 4 *SpanNames.h headers including
PathFind; the SpanGuardFactory.cpp test is correctly exempt from Rule F).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three robustness fixes to check_otel_naming.py, all on phase-1c where the
script lives:
- Rule F now runs UNCONDITIONALLY. It is a purely syntactic check on the
call-sites and does not need the L1 key set, so code that calls
SpanGuard::span/setAttribute directly without ever defining a *SpanNames.h
is still caught (previously it was silently skipped when no header existed).
- Exempt test files from Rule F (tests pass arbitrary literal keys to exercise
the API). The call-site matcher now requires a SpanGuard/`.`/`->` receiver,
so std::span and bare declarations no longer false-positive.
- Add Rule H (warning, non-fatal): a namespace-qualified constant used at a
telemetry call-site but not defined in any *SpanNames.h is flagged, catching
constants defined in-place instead of in the proper header. Bare locals and
std:: names are not warned to avoid noise.
SpanGuard.h / Telemetry.h @code examples updated to reference constants that
exist on this branch. README documents the new behavior.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The script and its README live on phase-1c (where check_otel_naming.py was
introduced). The test-file Rule-F exemption was mistakenly applied here on
phase-2; revert to phase-1c's version verbatim. The exemption and further
script improvements will land on phase-1c and merge forward, keeping the
script's logic on the branch that owns it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring the naming convention, code-sample cleanup, and CI naming check into
phase 2 (RPC tracing). Phase 2 introduces PathFindSpanNames.h.
Conflict resolution:
- 04-code-samples.md, POC_taskList.md: deletion wins.
- 02-design-decisions.md: took the convention-applied tables, but kept phase-2's
accurate PathFinding summary row (pathfind_fast/search_level/num_paths/...,
matching the implemented PathFindSpanNames.h).
- 05/08: took the code-block-free prose; kept phase-2's Phase2-5_taskList.md
index rows (dropping only the deleted POC row). Fixed stale setup_Telemetry/
make_Telemetry doc references to the code-correct setupTelemetry/makeTelemetry.
- Telemetry.h auto-merged to the constant-based @code examples.
check_otel_naming.py change: exempt test files from Rule F (tests pass
arbitrary literal keys to exercise the API). The check passes clean on the
merged tree (24 keys across 4 *SpanNames.h headers, including PathFind).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add Rule G to check_otel_naming.py: every span-attribute key must be
lower_snake_case (^[a-z][a-z0-9_]*$ per dot-separated segment). This catches
camelCase, UPPERCASE, and spaces in keys, which the structural (dotted) and
source (literal) rules did not. Document it in the script README and
CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add check_otel_naming.py and wire it into on-pr.yml so every PR validates
that span-attribute names stay consistent across the code, collector, Tempo,
dashboards, and docs.
- The valid key set is derived dynamically from the *SpanNames.h constants and
the resource attributes the code registers in Telemetry.cpp — no hardcoded
allowlist to drift.
- Each rule is presence-gated: it runs only when the file it needs is in the
tree, so the check is correct whether telemetry changes land in one PR or
several (the collector/Tempo/dashboard/runbook layers arrive in later phases).
- Rule A flags dotted span-attribute keys; Rule F flags string-literal
attribute keys and span-name arguments (values may be runtime data).
- stdlib-only, mirroring the levelization check (bare `python`, no pip step).
- Telemetry.h / SpanGuard.h @code examples now use *SpanNames.h constants so
the strict literal check passes.
- CONTRIBUTING.md documents the check and how to run it locally.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring the phase-1a/1b naming-convention and code-sample cleanup into 1c.
Conflict resolution:
- 04-code-samples.md, POC_taskList.md: deletion wins.
- OpenTelemetryPlan docs (01/02/03/05): took the convention-applied,
code-block-free versions; verified no attribute category, table row, or
section header was lost (the differences were dotted->underscore renames).
- Telemetry.h: kept 1c's RpcSpanNames.h constant-based example
(rpc_span::attr::command) over the string literal.
- 31 non-telemetry files are clean develop carry-forward (identical to 1b).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring the span attribute naming convention (phase 1a) into phase 1b.
Conflict resolution kept phase-1b's SpanGuard-based workflow and applied
the underscore naming convention to all non-code-sample text:
- Converted prose, tables, Mermaid labels, and TraceQL/PromQL query
references across the plan docs to the underscore form.
- Converted the two @code attribute-key examples in Telemetry.h
(command, tx_type).
- Left the code-sample files (04-code-samples.md, POC_taskList.md) and
03-implementation-strategy.md code blocks at the phase-1b version; the
code-sample docs are slated for removal on phase-1a.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The tx.transactor span covered only the apply stage; preflight and
preclaim had no telemetry, so a transaction that hard-failed those
stages produced no apply-pipeline span and per-stage latency/failure
was invisible.
Add tx.preflight and tx.preclaim spans in applySteps.cpp via a
makeStageSpan() helper using SpanGuard::hashSpan, so all three stages
share a deterministic trace_id derived from txID[0:16] even though they
run sequentially and often cross-thread. Each span carries stage,
tx_type, and ter_result; exceptions are recorded as tefEXCEPTION before
the public wrappers map them. The type lookup is guarded behind the
span-active check so it costs nothing when tracing is off.
Add a stage="apply" attribute to the tx.transactor span and move its
three hardcoded attribute strings to a new library-safe header
include/xrpl/tx/detail/TxApplySpanNames.h, which mirrors the daemon-side
TxSpanNames.h strings so the collector spanmetrics connector aggregates
both span sets under one dimension set.
A constants-contract test pins the span-name, attribute-key, and
stage-value strings; span content stays covered by the docker
integration test, as the rest of the telemetry suite is.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>