Conan was building all 33 packages from source because it couldn't
find compatible pre-built binaries on the remote. The optimized
compatibility mode improves binary matching across configurations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Conan was previously installed by prepare-runner or a separate step.
Since we're not using prepare-runner on native runners, install it
via pip alongside other Python dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The print-env action uses $CC which is not set on native ubuntu-latest
(only in container builds). Add continue-on-error so this diagnostic
step doesn't block the pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The XRPLF/actions/prepare-runner action hardcodes /root/.ccache and
/root/.conan2 for Linux, assuming container execution as root. This
workflow runs natively on ubuntu-latest as the runner user.
Replace prepare-runner with inline apt-get install of ccache + ninja,
and use CMake compiler launchers for ccache instead. Keep all other
main CI patterns: pinned actions, get-nproc, env-based secrets,
CCACHE_SLOPPINESS, print-env step.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match reusable-build-test-config.yml exactly:
- Use XRPLF/actions/prepare-runner for system-level ccache setup
- Use XRPLF/actions/get-nproc for dynamic parallelism
- Remove redundant Conan package cache (remote is the cache)
- Remove explicit CMake compiler launchers (prepare-runner handles it)
- Add CCACHE_SLOPPINESS and print-env step
- Pin action versions to commit SHAs
- Move secrets and github.event.inputs to env blocks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without this, changes to the workflow file itself don't trigger the
telemetry validation pipeline on push.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous suggestion commits introduced duplicate keys and missing
YAML structure. This commit fixes:
- Duplicate env: block → single block with ccache vars
- Missing jobs: key
- Duplicate apt-get install → single line with ccache
- Missing 'Install Python dependencies' step name
- Duplicate 'uses: setup-conan'
- Missing 'uses: actions/cache@v4' and 'with:' for cache step
- Duplicate 'Configure CMake' step name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite the build steps to mirror the main CI (reusable-build-test-config):
- Use build-deps action (conan install with --options:host='&:xrpld=True')
so the generated toolchain sets xrpld=ON and telemetry=True automatically
- Separate configure and build steps matching main CI pattern
- Run cmake from within build/ dir pointing to .. (same as main CI)
- Remove manual -Dtelemetry=ON -Dxrpld=ON (come from toolchain)
- Remove Docker DinD service (ubuntu-latest has Docker pre-installed)
- Install ninja-build system package for the Ninja generator
- Increase timeout to 90 min for full build + validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The telemetry-validation workflow used --output-folder=build with conan
install, which overrides the conanfile.py layout() and breaks include
paths for Conan-managed protobuf (runtime_version.h not found). Aligned
the build steps with the main CI's reusable-build-test-config.yml: drop
--output-folder, add Ninja generator, and explicit build_type setting.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename EOJSON heredoc delimiter to EOF_JSON to avoid cspell unknown word
- Add conan installation step (pip3 install conan) to telemetry-validation workflow
- Use shared setup-conan action for proper Conan profile/remote configuration
- Align build commands with reusable-build-test-config.yml conventions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OTel's spin_lock_mutex.h defines _WINSOCKAPI_ and includes <windows.h>,
which poisons the include state for boost/asio/detail/socket_types.hpp.
Pre-include the boost/asio socket types header on MSVC to get winsock2.h
in before the OTel headers interfere.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `metric_api` namespace alias (misc-unused-alias-decls)
- Add NOLINT(bugprone-empty-catch) to 5 observable gauge callback catches
that intentionally swallow exceptions when services aren't ready yet
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When telemetry=ON, XRPL_ENABLE_TELEMETRY is globally defined, causing
MetricsRegistry.cpp to compile its full OTel path which references
xrpld symbols (LedgerMaster, TxQ, OpenLedger) that cannot be linked
into the standalone GTest binary. Guard the test with #ifndef and only
add MetricsRegistry.cpp as a source when telemetry is OFF.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move test from src/test/telemetry/ (Beast unit_test::suite) to
src/tests/libxrpl/telemetry/ (GTest TEST_F). The test exercises the
no-op/disabled path only, which compiles without XRPL_ENABLE_TELEMETRY
and has no xrpld link dependencies beyond MetricsRegistry.cpp itself.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added missing #include <xrpld/app/ledger/OpenLedger.h> for
app.openLedger().current() calls in observable gauge callbacks.
- Added opentelemetry::context::Context{} as third argument to
Histogram::Record() calls — the initializer_list overload requires
an explicit Context parameter in the installed OTel C++ SDK version.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Regenerate loops.txt and ordering.txt to account for the bidirectional
dependency between xrpld.app and xrpld.telemetry introduced in Phase 9.
MetricsRegistry.cpp reads metrics from xrpld.app services (LedgerMaster,
TxQ, AcceptedLedger) while Application.cpp wires MetricsRegistry into
the app lifecycle — a pattern consistent with existing accepted loops
(overlay, peerfinder, rpc, shamap).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add $node template variable (exported_instance) to rippled-fee-market,
rippled-job-queue, and rippled-rpc-perf dashboards enabling multi-node
filtering. Add $job_type variable to job-queue and $method variable to
rpc-perf dashboards. Inject exported_instance=~"$node" filter into all
PromQL queries across these dashboards including rate(), histogram_quantile(),
topk(), and sum() expressions. Also add the instance filter to Phase 9
panels (NodeStore, Cache, CountedObjects) in system-node-health dashboard.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The trace context injection block is compiled out when
XRPL_ENABLE_TELEMETRY is not defined (coverage builds). codecov still
counts preprocessor-excluded lines as uncovered in the source diff.
Wrap with LCOV_EXCL_START/STOP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OTel SDK's TraceId::ToLowerBase16 and SpanId::ToLowerBase16 expect
opentelemetry::nostd::span<char, N> rather than raw char arrays. Also
corrected array sizes from 33/17 to 32/16 (no null terminator needed
since we use output.append(buf, N)).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add opentelemetry/trace/context.h to Log.cpp so that
opentelemetry::trace::GetSpan() resolves correctly.
Add 'logql' to cspell dictionary to silence unknown-word warnings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrade Loki from 2.9.0 to 3.4.2 which supports native OTLP ingestion.
Replace removed `loki` exporter with `otlphttp/loki` pointed at Loki's
/otlp endpoint. The `loki` exporter was dropped in otel-collector-contrib
v0.147.0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task 8.6: Add Log-Trace Correlation section to telemetry-runbook.md
with LogQL examples, verification steps, and troubleshooting guidance.
Update 09-data-collection-reference.md section 5a from "Future" to
actual implementation docs covering log format, ingestion pipeline,
Grafana correlation config, and Loki backend. Add Phase 8 log
correlation test section and troubleshooting to TESTING.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task 8.1: Inject trace_id/span_id into Logs::format() when an active
OTel span exists, guarded by #ifdef XRPL_ENABLE_TELEMETRY. Uses
thread-local GetSpan()/GetContext() with <10ns overhead per call.
Task 8.2: Add Grafana Loki service (grafana/loki:2.9.0) to the Docker
Compose stack with port 3100 exposed. Add Loki as a dependency for
otel-collector and grafana services.
Task 8.3: Add filelog receiver to OTel Collector config to tail rippled
debug.log files with regex_parser extracting timestamp, partition,
severity, trace_id, span_id, and message fields. Add loki exporter and
logs pipeline. Mount rippled log directory into collector container.
Task 8.4: Add tracesToLogs config in Tempo datasource provisioning
pointing to Loki with filterByTraceID enabled, enabling one-click
trace-to-log navigation in Grafana.
Task 8.5: Add check_log_correlation() function to integration-test.sh
that greps debug.log files for trace_id pattern and cross-checks the
trace_id exists in Jaeger.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds Grafana Loki data source with derivedFields config linking
trace_id values in log lines to Tempo traces. This enables one-click
log-to-trace navigation in Grafana Explore.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The else-if branch for server=="otel" in CollectorManager.cpp is never
reached in unit tests (no test configures [insight] with server=otel).
Mark it with LCOV_EXCL_START/STOP to exclude from patch coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The beast::Journal stream accessors (info, warn, etc.) are methods that
return a Stream object. They must be called with () to test if the log
level is active.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The label_values() PromQL function requires a metric name as the first
argument. Without it, Prometheus returns raw label hashes instead of
readable node names like "validator-0:6006".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CreateInt64Counter -> CreateUInt64Counter (no int64 counter in API)
- Counter<int64_t> member -> Counter<uint64_t> with static_cast in Add()
- Add async_instruments.h include for ObservableInstrument definition
- Replace JLOG() macro with beast::Journal if(info)/info() pattern
- MeterProviderFactory::Create(reader,resource) -> Create(views,resource)
+ AddMetricReader() (no 2-arg reader overload exists)
- HistogramAggregationConfig: std::list<double> -> std::vector<double>
using boundaries_ member instead of aggregate initialization
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add $node template variable to all 5 system-* Grafana dashboards
with exported_instance=~"$node" filter on all PromQL queries
- Add instanceId parameter to OTelCollector::New() factory to set
service.instance.id resource attribute on metrics (matches trace
exporter behavior for human-friendly node names in Grafana)
- CollectorManager reads service_instance_id from [insight] config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace StatsD UDP metric transport with native OpenTelemetry Metrics SDK
export via OTLP/HTTP behind the existing beast::insight::Collector interface.
- Task 7.1: Link opentelemetry-cpp to beast module in CMake when telemetry=ON
- Task 7.2: New OTelCollector class mapping beast::insight instruments to OTel
SDK (Counter, ObservableGauge, Histogram, Counter<uint64>) with OTLP/HTTP
export via PeriodicMetricReader at 1s intervals
- Task 7.3: Add server=otel branch to CollectorManager with endpoint config
- Task 7.4: Update otel-collector-config.yaml to use OTLP receiver for metrics
pipeline (StatsD receiver commented out for backward compat)
- Task 7.5: Metric names preserved via dot-to-underscore formatting matching
StatsD->Prometheus conventions
- Task 7.6: Rename Grafana dashboards from statsd-* to system-*, update titles
and UIDs from "StatsD" to "System Metrics"
- Task 7.7: Update integration test to use server=otel, verify OTLP metrics
- Task 7.8: Update runbook, TESTING.md, config reference, and data collection
reference docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the close time span and its 6 attributes to the Phase 4 consensus
span table and attribute table in 09-data-collection-reference.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>