Removed in b24e4647b under naive assumption we were "past debugging phase".
Reality: debugging is never done, verbose output is invaluable, costs nothing.
Keep -v flag permanently.
Reverts the unnecessary mktemp change from 638cb0afe that broke cache saving.
What happened:
- Original delta code used $$ (PID) for temp files: DELTA_TARBALL="/tmp/...-$$.tar.zst"
- This creates a STRING, not a file - zstd creates the file when writing
- When removing deltas (638cb0afe), I unnecessarily changed to mktemp for "better practice"
- mktemp CREATES an empty file - zstd refuses to overwrite it
- Result: "already exists; not overwritten" error
Why it seemed to work:
- Immutability check skipped save for existing caches
- Upload code path never executed during testing
- Bug only appeared when actually trying to create new cache
The fix:
- Revert to PID-based naming ($$) that was working
- Don't fix what isn't broken
Applies to both save and restore actions for consistency.
THE GREAT CULLING: Remove all OverlayFS and delta caching logic.
After extensive investigation and testing, we determined that OverlayFS
file-level layering is fundamentally incompatible with ccache's access
patterns:
- ccache opens files with O_RDWR → kernel must provide writable file handle
- OverlayFS must copy files to upper layer immediately (can't wait)
- Even with metacopy=on, metadata-only files still appear in upper layer
- Result: ~366MB deltas instead of tiny incremental diffs
The fundamental constraint: cannot have all three of:
1. Read-only lower layer (for base sharing)
2. Writable file handles (for O_RDWR)
3. Minimal deltas (for efficient caching)
Changes:
- Removed all OverlayFS mounting/unmounting logic
- Removed workspace and registry tracking
- Removed delta creation and restoration
- Removed use-deltas parameter
- Simplified to direct tar/extract workflow
Before: 726 lines across cache actions
After: 321 lines (-55% reduction)
Benefits:
- ✅ Simpler architecture (direct tar/extract)
- ✅ More maintainable (less code, less complexity)
- ✅ More reliable (fewer moving parts)
- ✅ Same performance (base-only was already used)
- ✅ Clear path forward (restic/borg for future optimization)
Current state works great:
- Build times: 20-30 min → 2-5 min (80% improvement)
- Cache sizes: ~323-609 MB per branch (with zst compression)
- S3 costs: acceptable for current volume
If bandwidth costs become problematic, migrate to restic/borg for
chunk-level deduplication (completely different architecture).
Mount OverlayFS with metacopy=on option (kernel 4.2+, supported on ubuntu-22.04).
This prevents full file copy-up when files are opened with O_RDWR but not modified.
Expected behavior:
- ccache opens cache files with write access
- OverlayFS creates metadata-only entry in upper layer
- Full copy-up only happens if data is actually written
- Should dramatically reduce delta sizes from ~324 MB to ~KB
Re-enabled use-deltas for ccache to test this optimization.
Conan remains base-only (hash-based keys mean exact match most of the time).
If successful, deltas should be tiny for cache hit scenarios.
OverlayFS operates at FILE LEVEL:
- File modified by 8 bytes → Copy entire 500 KB file to delta
- ccache updates LRU metadata (8 bytes) on every cache hit
- 667 cache hits = 667 files copied = 324 MB delta (nearly full cache)
What we need is BYTE LEVEL deltas:
- File modified by 8 bytes → Delta is 8 bytes
- 100% cache hit scenario: 10 KB delta (not 324 MB)
OverlayFS is the wrong abstraction for ccache. Binary diff tools (rsync,
xdelta3, borg) would work better, but require significant rework.
For now: Disable deltas, use base-only caching
- Base cache restore still provides massive speed boost (2-5 min vs 20-30)
- Simpler, more reliable
- Saves S3 bandwidth (no 324 MB delta uploads)
Future: Research binary diff strategies (see Gemini prompt in session notes)
PROBLEM: Setting cache_dir='~/.ccache' explicitly stores the
LITERAL tilde in ccache.conf, which might not expand properly.
SOLUTION: Don't set cache_dir at all - let ccache use its default
- Linux default: ~/.ccache
- ccache will expand the tilde correctly when using default
- Only configure max_size, hash_dir, compiler_check
- Remove CCACHE_DIR env var (not needed with default)
This should fix the issue where ccache runs but creates no cache files.
Add comprehensive debug output to inspect ~/.ccache contents:
- Full recursive directory listing
- Disk space check (rule out disk full)
- ccache config verification
- Directory sizes
- List actual files (distinguish config from cache data)
This will help diagnose why ccache is being invoked on every
compilation but not creating any cache files (0.0 GB).
Temporarily adding -v flag to ninja builds to see actual compile
commands. This will show whether ccache is actually being invoked
or if the CMAKE_C_COMPILER_LAUNCHER setting isn't taking effect.
We should see either:
ccache /usr/bin/g++-13 -c file.cpp (working)
/usr/bin/g++-13 -c file.cpp (not working)
This is temporary for debugging - remove once ccache issue resolved.
PROBLEM: Profile was created before cache restore, then immediately
overwritten by cached profile. This meant we were using a potentially
stale cached profile instead of fresh configuration.
SOLUTION: Move profile creation into dependencies action after restore
- Dependencies action now takes compiler params as inputs
- Profile created AFTER cache restore (overwrites any cached profile)
- Ensures fresh profile with correct compiler settings for each job
CHANGES:
- Dependencies action: Add inputs (os, arch, compiler, compiler_version, cc, cxx)
- Dependencies action: Add 'Configure Conan' step after cache restore
- Dependencies action: Support both Linux and macOS profile generation
- Nix workflow: Remove 'Configure Conan' step, pass compiler params
- macOS workflow: Detect compiler version, pass to dependencies action
Benefits:
✅ Profile always fresh and correct for current matrix config
✅ Cache still includes .conan.db and other important files
✅ Self-contained dependencies action (easier to understand)
✅ Works for both Linux (with explicit cc/cxx) and macOS (auto-detect)
The heredoc content inside the run block needs proper indentation
to be valid YAML. Without indentation, the YAML parser treats the
lines as top-level keys, causing syntax errors.
The generated CMake file will have leading spaces, but CMake handles
this fine (whitespace-tolerant parser).
PROBLEM: ccache was never being used (all builds compiling from scratch)
- CMake command-line args (-DCMAKE_C_COMPILER_LAUNCHER=ccache) were
being overridden by Conan's conan_toolchain.cmake
- This toolchain file loads AFTER command-line args and resets settings
- Result: Empty ccache entries (~200 bytes) in both old and new cache
SOLUTION: Wrapper toolchain that overlays ccache on Conan's toolchain
- Create wrapper_toolchain.cmake that includes Conan's toolchain first
- Then sets CMAKE_C_COMPILER_LAUNCHER and CMAKE_CXX_COMPILER_LAUNCHER
- This happens AFTER Conan's toolchain loads, so settings stick
- Only affects main app build (NOT Conan dependency builds)
CHANGES:
- Build action: Create wrapper toolchain when ccache enabled
- Build action: Use CMAKE_CURRENT_LIST_DIR for correct relative path
- Build action: Remove broken CCACHE_ARGS logic (was being overridden)
- Build action: Use ${TOOLCHAIN_FILE} variable instead of hardcoded path
This approach:
✅ Conan dependency builds: Clean (no ccache overhead)
✅ Main xahaud build: Uses ccache via wrapper
✅ Separation: Build action controls ccache, not Conan profile
✅ Simple: No profile regeneration, just one wrapper file
Path comparison was failing when registry had expanded paths
(/home/runner/.ccache) but input had unexpanded paths (~/.ccache),
causing bootstrap mode to not be detected.
Now both restore and save actions consistently expand tildes to
absolute paths before writing to or reading from the mount registry.
Removed split ccache configuration (~/.ccache-main + ~/.ccache-current).
The split had an edge case: building a feature branch before main branch
cache is primed results in double ccache disk usage:
- Feature branch builds first → populates ~/.ccache-current (2G max_size)
- Main branch builds later → populates ~/.ccache-main (2G max_size)
- Next feature branch build → restores both caches (4G total)
This doubles ccache disk usage unnecessarily on GitHub-hosted runners.
New single-directory approach:
- Single ~/.ccache directory for all branches
- Branch-specific keys with restore-keys fallback to main branch
- Cache actions handle cross-branch sharing via partial-match mode
- Simpler and prevents double disk usage edge case
Changes:
- xahau-configure-ccache: Single cache_dir input, removed branch logic
- xahau-ga-build: Single restore/save step, uses restore-keys for fallback
- xahau-ga-nix: Removed is_main_branch parameter
Disk usage: 2G max (vs 4G worst case with split)
The check-conanfile-changes logic prevented cache saves on feature
branches unless conanfile changed, but this was broken because:
1. Cache keys don't include branch name (shared across branches)
2. S3 immutability makes this unnecessary (first-write-wins)
3. It prevented bootstrap saves on feature branches
Also removed unused safe-branch step (dead code from copy-paste)
- Updated xahau-ga-dependencies to use xahau-actions-cache-restore/save
- Updated xahau-ga-build to use custom cache for ccache (4 operations)
- Added AWS credential inputs to both actions
- Main workflow now passes AWS secrets to cache actions
- Removed legacy ~/.conan path (Conan 2 uses ~/.conan2 only)
- All cache keys remain unchanged for compatibility
Instead of requiring commits with message tags, allow manual workflow
triggering with inputs:
- state_assertion: Expected state for validation
- start_state: Force specific starting state
- clear_cache: Clear cache before running
Benefits:
- No need for empty commits to trigger tests
- Easy manual testing via GitHub Actions UI
- Still supports commit message tags for push events
- Workflow inputs take priority over commit tags
Two fixes:
1. Restore: Trim whitespace from DELTA_COUNT to fix integer comparison
2. Save: Use 's3api put-object' instead of 's3 cp' to support --tagging
The aws s3 cp command doesn't support --tagging parameter.
Switched to s3api put-object which supports tagging directly.
Tags are needed for lifecycle policies (30-day eviction).
When [ci-clear-cache] is used but no cache exists yet, grep returns
exit code 1 which causes script failure with set -e.
Add || echo "0" to handle case where no deltas exist to delete.
Cache Clearing Feature:
- Add [ci-clear-cache] commit message tag detection in restore action
- Deletes base + all deltas when tag present
- Implicit access via github.event.head_commit.message env var
- No workflow changes needed (action handles automatically)
- Commit message only (not PR title) - one-time action
State Machine Test Workflow:
- Auto-detects state by counting state files (state0.txt, state1.txt, etc.)
- Optional [state:N] assertions validate detected == expected
- [start-state:N] forces specific state for scenario testing
- Dual validation: local cache state AND S3 objects
- 4 validation checkpoints: S3 before, local after restore, after build, S3 after save
- Self-documenting: prints next steps after each run
- Supports [ci-clear-cache] integration
Usage:
# Auto-advance (normal)
git commit -m 'continue testing'
# With assertion
git commit -m 'test delta [state:2]'
# Clear and restart
git commit -m 'fresh start [ci-clear-cache]'
# Jump to scenario
git commit -m 'test from state 3 [start-state:3]'
- Add automatic cleanup after each delta upload
- Query all deltas for key, sort by LastModified
- Keep only latest (just uploaded), delete all older ones
- Matches restore logic (only uses latest delta)
- Minimal storage: 1 delta per key (~2GB) vs unbounded growth
- Simpler than keeping N: restore never needs older deltas
- Concurrency-safe (idempotent batch deletes)
- Eliminates need for separate cleanup workflow
Rationale: Since restore only ever uses the single latest delta,
keeping historical deltas adds complexity without benefit. This
matches GitHub Actions semantics (one 'latest' per key).
- Generate random encryption key and store in GitHub Secrets via gh CLI
- Encrypt test message with GPG and commit to repo
- Decrypt in workflow using key from secrets and echo result
- Demonstrates encrypted secrets approach for SSH keys