Files
archy/docs/CURRENT_AGENT_HANDOFF.md
2026-06-11 00:24:54 -04:00

8.2 KiB

Current Agent Handoff - Bitcoin UI Recovery And 1.8-alpha Resume

Last updated: 2026-06-10 05:33 EDT

Read This First

This is a separate handoff from docs/NEXT_TERMINAL_HANDOFF.md. That file tracks an older/broader plan. For the next agent resuming this machine-switch pause, read this file first, then read:

  • docs/RESUME.md
  • docs/1.8-alpha-improvements-tracker.md
  • docs/CONTAINER_LIFECYCLE_HANDOFF.md
  • docs/MIGRATION_STATUS_REPORT.md

Do not assume docs/NEXT_TERMINAL_HANDOFF.md is the current short-term plan.

Current Goal

Cut Archipelago 1.8-alpha, including a ready-to-test ISO image.

The release goal is not just "apps launch once"; the app/container system needs to be developer-ready and production-release ready:

  • manifests and docs must describe the real runtime contract;
  • apps must install, start, stop, restart, uninstall, reinstall, survive reboot, report truthful status, and show useful progress;
  • My Apps must preserve last-known truth during Podman/scanner backoff instead of showing false empty/no-app states;
  • Bitcoin-dependent apps must explain sync/wallet readiness instead of looking broken;
  • final validation needs focused lifecycle, broad non-destructive lifecycle, then repeated reboot checks before ISO cut/smoke test.

Current Estimate

As of this pause:

  • Credible release candidate: roughly 87-91%.
  • Production-quality release developers will love: roughly 73-79%.
  • Calendar estimate if the remaining systemic lifecycle issues are bounded: 1-2 focused engineering days for a release candidate, then additional reboot/ISO smoke time.
  • The biggest remaining risk is not catalog wiring; it is rootless Podman control-plane responsiveness, stale scanner state, lifecycle progress UX, and reboot validation.

Validation Host

  • Host: 192.168.1.198
  • SSH user: archipelago
  • Password used in this session: password123
  • Active Bitcoin app on this host: bitcoin-knots, not bitcoin-core
  • Keep archipelago-doctor.timer and archipelago-reconcile.timer inactive for deterministic validation unless intentionally testing them.
  • Preserve app data.
  • Avoid broad Podman store/image cleanup commands on .198.

Bitcoin UI Incident Summary

User reported the Bitcoin custom UI showing:

Bitcoin node is starting or busy syncing; retrying automatically. Detail: getblockchaininfo: Bitcoin RPC request failed ... operation timed out

Then after listener repair, the message changed through:

  • Connection refused
  • Verifying blocks...
  • then the user reported it looked fine again.

What happened:

  • The node is a bitcoin-knots node.
  • During live debugging, the wrong alias, bitcoin-core, was started/stopped.
  • bitcoin-core and bitcoin-knots compete for the same Bitcoin RPC/P2P ports.
  • That action left the real bitcoin-knots service active but without the host 8332 rootlessport listener for a while.
  • Stopping the stray bitcoin-core.service and restarting only bitcoin-knots.service recreated listeners on 8332 and 8333.
  • After restart, bitcoind entered the normal -28 Verifying blocks... phase.
  • The user later reported the Bitcoin UI looked fine again.

Known live state observed during recovery:

  • bitcoin-knots.service: active
  • bitcoin-core.service: inactive
  • archy-bitcoin-ui.service: active
  • listeners present after repair:
    • 8332 via rootlessport
    • 8333 via rootlessport
    • 8334 via nginx/Bitcoin UI
  • bitcoin-knots logs showed active IBD around height 4137xx and progress about 0.09438.

Do not restart Bitcoin again unless there is a fresh confirmed service/listener failure. If checking status, prefer read-only probes and avoid starting the wrong variant.

Source Fixes Made Locally

These local edits were made after live Bitcoin recovered. They are not deployed yet and were not fully validated before the user paused.

core/archipelago/src/bitcoin_status.rs

Changed Bitcoin status cache behavior and copy:

  • refresh interval changed from 5s to 10s;
  • transient error backoff added at 15s;
  • RPC client timeout increased from 8s to 20s;
  • error context now uses full anyhow chain with {e:#};
  • transient classifications now include common overloaded/backend states;
  • user-facing copy now distinguishes:
    • verifying blocks after restart;
    • waiting for the Bitcoin RPC listener;
    • busy and not answering RPC before the timeout;
    • generic starting or busy syncing;
  • added unit tests for the three user-visible states above.

Intent: stop collapsing distinct backend states into the same stale "starting or busy syncing" timeout message.

core/archipelago/src/api/rpc/package/update.rs

Narrow Bitcoin alias fix added:

  • orchestrator_update_app_id("bitcoin-knots") now remains "bitcoin-knots" instead of mapping to "bitcoin-core";
  • candidate app IDs for a Bitcoin container now prefer bitcoin-knots before bitcoin-core;
  • tests updated to lock this behavior.

Intent: bitcoin-core and bitcoin-knots can be dependency/status aliases, but must not be interchangeable lifecycle/update targets on a node that has a specific installed variant.

Important: this file also already contained other uncommitted update/pull timeout changes from prior work. Do not assume every diff in this file came from this interruption.

Validation Status At Pause

Completed:

  • cargo fmt --manifest-path core/Cargo.toml --all passed after the local Bitcoin edits.

Attempted but not completed:

  • Targeted Cargo tests were first launched in three separate /tmp target dirs and failed due /tmp filling with No space left on device.
  • Those temporary dirs were removed:
    • /tmp/archy-cargo-bitcoin-status
    • /tmp/archy-cargo-update-alias
    • /tmp/archy-cargo-container-candidates
  • A second run using CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix was still compiling when the user paused. It was terminated for handoff.
  • No successful Rust test result exists yet for the new Bitcoin status/alias tests.

Recommended validation after resume:

git diff --check -- core/archipelago/src/bitcoin_status.rs core/archipelago/src/api/rpc/package/update.rs docs/CURRENT_AGENT_HANDOFF.md
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago bitcoin_status::tests
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago update_aliases_map_to_manifest_app_ids
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago container_name_candidates_cover_common_aliases

If Cargo target locking appears stale, check for real cargo/rustc workers before deleting anything. Prefer workspace-local target dirs under .codex-tmp over new cold /tmp targets.

Immediate Next Steps

  1. Confirm no lingering Cargo process:

    pgrep -af "cargo|rustc|cargo-bitcoin-fix"
    
  2. Validate the local Bitcoin source fixes listed above.

  3. If validation passes, build/deploy the backend to .198 only after confirming the user still wants deployment.

  4. Recheck live Bitcoin non-destructively:

    • bitcoin-knots.service active;
    • bitcoin-core.service inactive;
    • listeners on 8332, 8333, 8334;
    • Bitcoin UI loads on 8334;
    • /bitcoin-status returns useful copy if backend is busy.
  5. Resume release backlog:

    • rootless Podman lifecycle/control-plane responsiveness;
    • My Apps last-known-state truthfulness during scanner backoff;
    • progress UX for install/uninstall/start/stop/restart;
    • remaining tracker rows in docs/1.8-alpha-improvements-tracker.md;
    • focused lifecycle matrix on .198;
    • broad non-destructive lifecycle;
    • 3 clean reboot validations minimum, 5 preferred;
    • ISO cut and ISO smoke test.

Cautions For Next Agent

  • Do not start bitcoin-core on .198 unless intentionally migrating variants.
  • Treat bitcoin-knots as the installed Bitcoin variant.
  • Do not run broad Podman prune/store cleanup.
  • Do not revert unrelated dirty worktree changes.
  • docs/NEXT_TERMINAL_HANDOFF.md exists but is not the short-term handoff for this pause.
  • Many repo files are dirty from broader release hardening. Read diffs before attributing changes.