Commit Graph

13862 Commits

Author SHA1 Message Date
Nicholas Dudfield
83f6bc64e1 fix: restore ninja -v flag for compile command visibility
Removed in b24e4647b under naive assumption we were "past debugging phase".

Reality: debugging is never done, verbose output is invaluable, costs nothing.
Keep -v flag permanently.
2025-10-31 13:09:53 +07:00
Nicholas Dudfield
be6fad9692 fix: revert to PID-based temp files (was working before)
Reverts the unnecessary mktemp change from 638cb0afe that broke cache saving.

What happened:
- Original delta code used $$ (PID) for temp files: DELTA_TARBALL="/tmp/...-$$.tar.zst"
- This creates a STRING, not a file - zstd creates the file when writing
- When removing deltas (638cb0afe), I unnecessarily changed to mktemp for "better practice"
- mktemp CREATES an empty file - zstd refuses to overwrite it
- Result: "already exists; not overwritten" error

Why it seemed to work:
- Immutability check skipped save for existing caches
- Upload code path never executed during testing
- Bug only appeared when actually trying to create new cache

The fix:
- Revert to PID-based naming ($$) that was working
- Don't fix what isn't broken

Applies to both save and restore actions for consistency.
2025-10-31 12:10:13 +07:00
Nicholas Dudfield
b24e4647ba fix: configure ccache after cache restore to prevent stale config
Same ordering bug as Conan profile (Session 8) - ccache config was being
created in workflow BEFORE cache restore, causing cached ccache.conf to
overwrite fresh configuration.

Changes:
- Build action: Add ccache config inputs (max_size, hash_dir, compiler_check)
- Build action: Configure ccache AFTER cache restore (overwrites cached config)
- Build action: Add "Show ccache config before build" step (debugging aid)
- Build action: Remove debug steps (past debugging phase)
- Build action: Remove ninja -v flag (past debugging phase)
- Nix workflow: Remove "Configure ccache" step (now handled in build action)
- macOS workflow: Remove "Configure ccache" step (now handled in build action)
- macOS workflow: Add missing stdlib and AWS credentials to build step
- Delete unused xahau-configure-ccache action (logic moved to build action)

Flow now matches Conan pattern:
1. Restore cache (includes potentially stale config)
2. Configure ccache (overwrites with fresh config: 2G max, hash_dir=true, compiler_check=content)
3. Show config (verification)
4. Build

This ensures fresh ccache configuration for each job, preventing issues
from cached config files with different settings.
2025-10-31 11:03:12 +07:00
Nicholas Dudfield
638cb0afe5 refactor: remove OverlayFS delta caching entirely
THE GREAT CULLING: Remove all OverlayFS and delta caching logic.

After extensive investigation and testing, we determined that OverlayFS
file-level layering is fundamentally incompatible with ccache's access
patterns:

- ccache opens files with O_RDWR → kernel must provide writable file handle
- OverlayFS must copy files to upper layer immediately (can't wait)
- Even with metacopy=on, metadata-only files still appear in upper layer
- Result: ~366MB deltas instead of tiny incremental diffs

The fundamental constraint: cannot have all three of:
1. Read-only lower layer (for base sharing)
2. Writable file handles (for O_RDWR)
3. Minimal deltas (for efficient caching)

Changes:
- Removed all OverlayFS mounting/unmounting logic
- Removed workspace and registry tracking
- Removed delta creation and restoration
- Removed use-deltas parameter
- Simplified to direct tar/extract workflow

Before: 726 lines across cache actions
After:  321 lines (-55% reduction)

Benefits:
-  Simpler architecture (direct tar/extract)
-  More maintainable (less code, less complexity)
-  More reliable (fewer moving parts)
-  Same performance (base-only was already used)
-  Clear path forward (restic/borg for future optimization)

Current state works great:
- Build times: 20-30 min → 2-5 min (80% improvement)
- Cache sizes: ~323-609 MB per branch (with zst compression)
- S3 costs: acceptable for current volume

If bandwidth costs become problematic, migrate to restic/borg for
chunk-level deduplication (completely different architecture).
2025-10-31 10:30:31 +07:00
Nicholas Dudfield
bd384e6bc1 Revert "feat: enable metacopy=on to test metadata-only copy-up"
This reverts commit 4c546e5d91.
2025-10-31 09:51:27 +07:00
Nicholas Dudfield
4c546e5d91 feat: enable metacopy=on to test metadata-only copy-up
Mount OverlayFS with metacopy=on option (kernel 4.2+, supported on ubuntu-22.04).
This prevents full file copy-up when files are opened with O_RDWR but not modified.

Expected behavior:
- ccache opens cache files with write access
- OverlayFS creates metadata-only entry in upper layer
- Full copy-up only happens if data is actually written
- Should dramatically reduce delta sizes from ~324 MB to ~KB

Re-enabled use-deltas for ccache to test this optimization.
Conan remains base-only (hash-based keys mean exact match most of the time).

If successful, deltas should be tiny for cache hit scenarios.
2025-10-31 09:36:09 +07:00
Nicholas Dudfield
28727b3f86 perf: disable delta caching (use base-only mode)
OverlayFS operates at FILE LEVEL:
- File modified by 8 bytes → Copy entire 500 KB file to delta
- ccache updates LRU metadata (8 bytes) on every cache hit
- 667 cache hits = 667 files copied = 324 MB delta (nearly full cache)

What we need is BYTE LEVEL deltas:
- File modified by 8 bytes → Delta is 8 bytes
- 100% cache hit scenario: 10 KB delta (not 324 MB)

OverlayFS is the wrong abstraction for ccache. Binary diff tools (rsync,
xdelta3, borg) would work better, but require significant rework.

For now: Disable deltas, use base-only caching
- Base cache restore still provides massive speed boost (2-5 min vs 20-30)
- Simpler, more reliable
- Saves S3 bandwidth (no 324 MB delta uploads)

Future: Research binary diff strategies (see Gemini prompt in session notes)
2025-10-31 08:11:47 +07:00
Nicholas Dudfield
a4f96a435a fix: don't override ccache's default cache_dir
PROBLEM: Setting cache_dir='~/.ccache' explicitly stores the
LITERAL tilde in ccache.conf, which might not expand properly.

SOLUTION: Don't set cache_dir at all - let ccache use its default
- Linux default: ~/.ccache
- ccache will expand the tilde correctly when using default
- Only configure max_size, hash_dir, compiler_check
- Remove CCACHE_DIR env var (not needed with default)

This should fix the issue where ccache runs but creates no cache files.
2025-10-30 17:58:08 +07:00
Nicholas Dudfield
d0f63cc2d1 debug: add ccache directory inspection after build
Add comprehensive debug output to inspect ~/.ccache contents:
- Full recursive directory listing
- Disk space check (rule out disk full)
- ccache config verification
- Directory sizes
- List actual files (distinguish config from cache data)

This will help diagnose why ccache is being invoked on every
compilation but not creating any cache files (0.0 GB).
2025-10-30 17:53:33 +07:00
Nicholas Dudfield
2433bfe277 debug: add ninja verbose output to see compile commands
Temporarily adding -v flag to ninja builds to see actual compile
commands. This will show whether ccache is actually being invoked
or if the CMAKE_C_COMPILER_LAUNCHER setting isn't taking effect.

We should see either:
  ccache /usr/bin/g++-13 -c file.cpp  (working)
  /usr/bin/g++-13 -c file.cpp         (not working)

This is temporary for debugging - remove once ccache issue resolved.
2025-10-30 17:04:00 +07:00
Nicholas Dudfield
ef40a7f351 refactor: move Conan profile creation after cache restore
PROBLEM: Profile was created before cache restore, then immediately
overwritten by cached profile. This meant we were using a potentially
stale cached profile instead of fresh configuration.

SOLUTION: Move profile creation into dependencies action after restore
- Dependencies action now takes compiler params as inputs
- Profile created AFTER cache restore (overwrites any cached profile)
- Ensures fresh profile with correct compiler settings for each job

CHANGES:
- Dependencies action: Add inputs (os, arch, compiler, compiler_version, cc, cxx)
- Dependencies action: Add 'Configure Conan' step after cache restore
- Dependencies action: Support both Linux and macOS profile generation
- Nix workflow: Remove 'Configure Conan' step, pass compiler params
- macOS workflow: Detect compiler version, pass to dependencies action

Benefits:
 Profile always fresh and correct for current matrix config
 Cache still includes .conan.db and other important files
 Self-contained dependencies action (easier to understand)
 Works for both Linux (with explicit cc/cxx) and macOS (auto-detect)
2025-10-30 16:41:43 +07:00
Nicholas Dudfield
a4a4126bdc fix: indent heredoc content for valid YAML syntax
The heredoc content inside the run block needs proper indentation
to be valid YAML. Without indentation, the YAML parser treats the
lines as top-level keys, causing syntax errors.

The generated CMake file will have leading spaces, but CMake handles
this fine (whitespace-tolerant parser).
2025-10-30 16:17:40 +07:00
Nicholas Dudfield
0559b6c418 fix: enable ccache for main app via wrapper toolchain
PROBLEM: ccache was never being used (all builds compiling from scratch)
- CMake command-line args (-DCMAKE_C_COMPILER_LAUNCHER=ccache) were
  being overridden by Conan's conan_toolchain.cmake
- This toolchain file loads AFTER command-line args and resets settings
- Result: Empty ccache entries (~200 bytes) in both old and new cache

SOLUTION: Wrapper toolchain that overlays ccache on Conan's toolchain
- Create wrapper_toolchain.cmake that includes Conan's toolchain first
- Then sets CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER
- This happens AFTER Conan's toolchain loads, so settings stick
- Only affects main app build (NOT Conan dependency builds)

CHANGES:
- Build action: Create wrapper toolchain when ccache enabled
- Build action: Use CMAKE_CURRENT_LIST_DIR for correct relative path
- Build action: Remove broken CCACHE_ARGS logic (was being overridden)
- Build action: Use ${TOOLCHAIN_FILE} variable instead of hardcoded path

This approach:
 Conan dependency builds: Clean (no ccache overhead)
 Main xahaud build: Uses ccache via wrapper
 Separation: Build action controls ccache, not Conan profile
 Simple: No profile regeneration, just one wrapper file
2025-10-30 16:06:45 +07:00
Nicholas Dudfield
f8d1a6f2b4 fix: normalize paths in cache actions to fix bootstrap detection
Path comparison was failing when registry had expanded paths
(/home/runner/.ccache) but input had unexpanded paths (~/.ccache),
causing bootstrap mode to not be detected.

Now both restore and save actions consistently expand tildes to
absolute paths before writing to or reading from the mount registry.
2025-10-30 13:25:55 +07:00
Nicholas Dudfield
c46ede7c8f chore: bump CACHE_VERSION to 3 for fresh bootstrap
Triggering clean rebuild after S3 cache cleared and ccache simplified to single directory.
2025-10-30 11:57:02 +07:00
Nicholas Dudfield
0e2bc365ea refactor: simplify ccache to single directory with restore-keys fallback
Removed split ccache configuration (~/.ccache-main + ~/.ccache-current).

The split had an edge case: building a feature branch before main branch
cache is primed results in double ccache disk usage:
- Feature branch builds first → populates ~/.ccache-current (2G max_size)
- Main branch builds later → populates ~/.ccache-main (2G max_size)
- Next feature branch build → restores both caches (4G total)

This doubles ccache disk usage unnecessarily on GitHub-hosted runners.

New single-directory approach:
- Single ~/.ccache directory for all branches
- Branch-specific keys with restore-keys fallback to main branch
- Cache actions handle cross-branch sharing via partial-match mode
- Simpler and prevents double disk usage edge case

Changes:
- xahau-configure-ccache: Single cache_dir input, removed branch logic
- xahau-ga-build: Single restore/save step, uses restore-keys for fallback
- xahau-ga-nix: Removed is_main_branch parameter

Disk usage: 2G max (vs 4G worst case with split)
2025-10-30 11:24:14 +07:00
Nicholas Dudfield
446bc76b69 fix: remove broken cache save conditions for Conan
The check-conanfile-changes logic prevented cache saves on feature
branches unless conanfile changed, but this was broken because:
1. Cache keys don't include branch name (shared across branches)
2. S3 immutability makes this unnecessary (first-write-wins)
3. It prevented bootstrap saves on feature branches

Also removed unused safe-branch step (dead code from copy-paste)
2025-10-30 10:35:37 +07:00
Nicholas Dudfield
a0c38a4fb3 fix: disable deltas for Conan cache (base-only mode)
Conan dependencies are relatively static, so use base-only mode
instead of delta caching to avoid unnecessary complexity
2025-10-30 10:03:48 +07:00
Nicholas Dudfield
631650f7eb feat(wip): add nd-experiment-overlayfs-2025-10-29 to nix push 2025-10-30 09:56:33 +07:00
Nicholas Dudfield
0b31d8e534 feat: replace actions/cache with custom S3+OverlayFS cache
- Updated xahau-ga-dependencies to use xahau-actions-cache-restore/save
- Updated xahau-ga-build to use custom cache for ccache (4 operations)
- Added AWS credential inputs to both actions
- Main workflow now passes AWS secrets to cache actions
- Removed legacy ~/.conan path (Conan 2 uses ~/.conan2 only)
- All cache keys remain unchanged for compatibility
2025-10-30 09:55:06 +07:00
Nicholas Dudfield
ecf03f4afe test: expect state 1 [ci-clear-cache] 2025-10-30 09:50:57 +07:00
Nicholas Dudfield
b801c2837d test: expect state 3 2025-10-30 09:46:17 +07:00
Nicholas Dudfield
1474e808cb test: expect state 2 2025-10-30 09:43:26 +07:00
Nicholas Dudfield
457e633a81 feat(test): add workflow_dispatch inputs for state machine testing
Instead of requiring commits with message tags, allow manual workflow
triggering with inputs:
- state_assertion: Expected state for validation
- start_state: Force specific starting state
- clear_cache: Clear cache before running

Benefits:
- No need for empty commits to trigger tests
- Easy manual testing via GitHub Actions UI
- Still supports commit message tags for push events
- Workflow inputs take priority over commit tags
2025-10-30 09:40:26 +07:00
Nicholas Dudfield
7ea99caa19 fix(cache): trim whitespace in delta count + use s3api for tagging
Two fixes:
1. Restore: Trim whitespace from DELTA_COUNT to fix integer comparison
2. Save: Use 's3api put-object' instead of 's3 cp' to support --tagging

The aws s3 cp command doesn't support --tagging parameter.
Switched to s3api put-object which supports tagging directly.
Tags are needed for lifecycle policies (30-day eviction).
2025-10-30 09:29:02 +07:00
Nicholas Dudfield
3e5c15c172 fix(cache): handle empty cache gracefully in [ci-clear-cache]
When [ci-clear-cache] is used but no cache exists yet, grep returns
exit code 1 which causes script failure with set -e.

Add || echo "0" to handle case where no deltas exist to delete.
2025-10-30 09:25:21 +07:00
Nicholas Dudfield
52b4fb503c feat(cache): implement [ci-clear-cache] tag + auto-detecting state machine test
Cache Clearing Feature:
- Add [ci-clear-cache] commit message tag detection in restore action
- Deletes base + all deltas when tag present
- Implicit access via github.event.head_commit.message env var
- No workflow changes needed (action handles automatically)
- Commit message only (not PR title) - one-time action

State Machine Test Workflow:
- Auto-detects state by counting state files (state0.txt, state1.txt, etc.)
- Optional [state:N] assertions validate detected == expected
- [start-state:N] forces specific state for scenario testing
- Dual validation: local cache state AND S3 objects
- 4 validation checkpoints: S3 before, local after restore, after build, S3 after save
- Self-documenting: prints next steps after each run
- Supports [ci-clear-cache] integration

Usage:
  # Auto-advance (normal)
  git commit -m 'continue testing'

  # With assertion
  git commit -m 'test delta [state:2]'

  # Clear and restart
  git commit -m 'fresh start [ci-clear-cache]'

  # Jump to scenario
  git commit -m 'test from state 3 [start-state:3]'
2025-10-30 09:19:12 +07:00
Nicholas Dudfield
98123fa934 feat(cache): implement inline delta cleanup (keep only 1 per key)
- Add automatic cleanup after each delta upload
- Query all deltas for key, sort by LastModified
- Keep only latest (just uploaded), delete all older ones
- Matches restore logic (only uses latest delta)
- Minimal storage: 1 delta per key (~2GB) vs unbounded growth
- Simpler than keeping N: restore never needs older deltas
- Concurrency-safe (idempotent batch deletes)
- Eliminates need for separate cleanup workflow

Rationale: Since restore only ever uses the single latest delta,
keeping historical deltas adds complexity without benefit. This
matches GitHub Actions semantics (one 'latest' per key).
2025-10-29 14:51:28 +07:00
Nicholas Dudfield
ce7b1c4f1d feat: add custom S3+OverlayFS cache actions with configurable delta support
Implements drop-in replacement for actions/cache using S3 backend and OverlayFS for delta caching:

- xahau-actions-cache-restore: Downloads immutable base + optional latest delta
- xahau-actions-cache-save: Saves immutable bases (bootstrap/partial-match) or timestamped deltas (exact-match)

Key features:
- Immutable bases: One static base per key (first-write-wins, GitHub Actions semantics)
- Timestamped deltas: Always-timestamped to eliminate concurrency issues
- Configurable use-deltas parameter (default true):
  - true: For symbolic keys (branch-based) - massive bandwidth savings via incremental deltas
  - false: For content-based keys (hash-based) - base-only mode, no delta complexity
- Three cache modes: bootstrap, partial-match (restore-keys), exact-match
- OverlayFS integration: Automatic delta extraction via upperdir, whiteout file support
- S3 lifecycle ready: Bases tagged 'type=base', deltas tagged 'type=delta-archive'

Decision rule for use-deltas:
- Content-based discriminator (hashFiles, commit SHA) → use-deltas: false
- Symbolic discriminator (branch name, tag, PR) → use-deltas: true

Also disables existing workflows temporarily during development.
2025-10-29 13:07:40 +07:00
Nicholas Dudfield
e062dcae58 feat(wip): comment out unused secret encryption 2025-10-29 09:05:17 +07:00
Nicholas Dudfield
a9d284fec1 feat(wip): use new key names 2025-10-29 09:02:24 +07:00
Nicholas Dudfield
065d0c3e07 feat(wip): remove currently unused workflows 2025-10-29 08:57:21 +07:00
Nicholas Dudfield
4fda40b709 test: add S3 upload to overlayfs delta test
- Upload delta tarball to S3 bucket
- Test file: hello-world-first-test.tar.gz
- Uses new secret names: XAHAUD_GITHUB_ACTIONS_CACHE_NIQ_KEY_ID/ACCESS_KEY
- Verifies upload with aws s3 ls
- Complete end-to-end test: OverlayFS → tarball → S3
2025-10-29 08:49:05 +07:00
Nicholas Dudfield
6014356d91 test: add encrypted secrets test to overlayfs workflow
- Generate random encryption key and store in GitHub Secrets via gh CLI
- Encrypt test message with GPG and commit to repo
- Decrypt in workflow using key from secrets and echo result
- Demonstrates encrypted secrets approach for SSH keys
2025-10-29 08:04:28 +07:00
Nicholas Dudfield
d790f97430 feat(wip): experiment overlayfs 2025-10-29 07:52:06 +07:00
tequ
9ed20a4f1c Refactor: SetCron to CronSet (#609) 2025.10.27-release+2405 2025-10-27 14:38:40 +10:00
tequ
89ffc1969b Add Previous fields to ltCron (#611) 2025-10-27 14:36:57 +10:00
tequ
79fdafe638 Support Cron in util_keylet Hook API (#612) 2025-10-27 14:35:01 +10:00
tequ
2a10013dfc Support 'cron' with ledger_entry RPC (#608) 2025-10-24 17:05:14 +10:00
tequ
6f148a8ac7 ExtendedHookState (#406) 2025-10-23 18:57:38 +10:00
tequ
96222baf5e Add hook header generators and CI verification workflow (#597) 2025-10-22 15:25:38 +10:00
Niq Dudfield
74477d2c13 added configurable NuDB block size support in xahaud (#601) 2025-10-22 14:15:12 +10:00
Alloy Networks
9378f1a0ad Update CONTRIBUTING.md (#599) 2025-10-21 14:20:10 +10:00
tequ
6fa6a96e3a Introduce StartTime in CronSet and improve next execution scheduling (#596) 2025-10-21 14:17:53 +10:00
RichardAH
b0fcd36bcd import_vl_keys logic fix (flap fix) (#588) 2025-10-18 16:27:05 +10:00
RichardAH
1ec31e79c9 Cron (on ledger cronjobs) (#590)
Co-authored-by: tequ <git@tequ.dev>
2025-10-17 18:45:16 +10:00
tequ
9c8b005406 fix: improve logging for transaction preflight failures in applyHook.cpp (#566) 2025-10-15 12:33:32 +10:00
tequ
687ccf4203 Remove unused variable enabled in MultiSign_test.cpp (#592) 2025-10-15 12:32:31 +10:00
Niq Dudfield
83f09fd8ab ci: add clang to build matrix [ci-nix-full-matrix] (#569) 2025-10-15 11:26:31 +10:00
tequ
15c7ad6f78 Fix Invalid Tx flags (#514) 2025-10-14 15:35:48 +10:00