Commit Graph

5 Commits

Author SHA1 Message Date
archipelago
a9896aabfa test(lifecycle): add dedicated electrumx.bats suite
Same shape as bitcoin-knots.bats and lnd.bats so the 20× release-gate
exercises electrumx through the same state matrix it uses for the other
two core apps. electrumx previously had a single TCP-port check inside
required-stack.bats; this adds destructive + cascade-destructive tiers.

10 @test cases:
* read-only: presence, valid state, TCP port (50001) reachable, no
  orphan containers beyond {electrumx, archy-electrs-ui}
* destructive: stop, start, restart, TCP port recovers within 120s of
  cold restart (longer than bitcoind because electrumx replays its
  index against bitcoind on start)
* cascade: uninstall, reinstall (240s timeout for index rebuild)

With this suite, the three single-container core apps (bitcoin-knots,
lnd, electrumx) now have parity coverage. Multi-container stacks
(btcpay, mempool, fedimint) come next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:11:02 -04:00
archipelago
abe07c6588 test(lifecycle): add dedicated lnd.bats suite
Mirrors bitcoin-knots.bats so the 20× release-gate run exercises lnd
through the same state matrix. lnd previously had only a single
read-only check inside required-stack.bats; this adds the destructive
and cascade-destructive tiers that match what we already test for
bitcoin-knots.

10 @test cases:
* read-only: presence, valid state, lncli getinfo, no orphan containers
* destructive (ARCHY_ALLOW_DESTRUCTIVE=1): stop, start, restart,
  RPC recovers within 90s of cold restart (longer than bitcoind
  because the wallet has to unlock first)
* cascade (ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1): uninstall, reinstall

Reuses the same lncli invocation as required-stack.bats so divergence
shows up clearly if either test breaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:09:43 -04:00
archipelago
168a0d9509 test(lifecycle): add setup-teardown + run-20x harness scaffolding
Phase 4 of the v1.7.52 container excellence plan: a release-gate harness
that loops the bats suite N times in a row, with teardown between
iterations, and reports a pass/fail tally.

* setup-teardown.sh — clears /tmp/archy-rpc-session-* between runs so
  iteration N+1 doesn't reuse a logged-out cookie from iteration N.
  Idempotent; safe to run anytime. Designed to grow as we add suites
  that leave other transient state.
* run-20x.sh — wraps run.sh in a loop of ARCHY_ITERATIONS (default 20).
  Tracks per-iteration pass/fail with wall-clock timing, prints a
  results block, exits non-zero on any failure. Honors ARCHY_FAIL_FAST
  for short-circuit during dev.

Suggested release-gate command:
  ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 \
    tests/lifecycle/run-20x.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:06:09 -04:00
archipelago
6e716f68b6 refactor(container): move companion UIs to systemd via Quadlet
Companion UI containers (archy-bitcoin-ui, archy-lnd-ui,
archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn
blocks from install.rs. If archipelago crashed mid-spawn or the
container's cgroup was reaped, companions vanished from podman ps -a
and only a manual rm/run could bring them back (the .228 incident).

Now each companion is rendered as a Quadlet .container unit under
~/.config/containers/systemd/, daemon-reloaded, and started via
systemctl --user. systemd owns supervision from that point on:

- archipelago can crash, restart, or be uninstalled without touching
  any companion.
- Quadlet's Restart=always + RestartSec=10 handles container exits.
- A 30s reconcile tick in boot_reconciler enumerates expected
  companion units and re-installs any whose unit file or service
  vanished — defense-in-depth against external tampering.

New module layout:
- container/quadlet.rs: pure unit renderer + atomic write_if_changed
  + systemctl helpers (daemon_reload_user / enable_now / disable_remove
  / is_active). 6 unit tests, no I/O in the renderer.
- container/companion.rs: per-app companion specs, install/remove/
  reconcile, image presence (build local first, fall back to insecure
  registry only via image_uses_insecure_registry whitelist). 2 tests.

install.rs handle_package_install now ends with a single call to
companion::install_for(package_id), replacing 287 lines of spawn-and-
hope shellouts plus a ~120-line nginx auth-injector helper that worked
around per-node RPC password baking. The helper is gone too — the
pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/
bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only.

runtime.rs handle_package_uninstall now disables companions before
the container rm loop. Otherwise systemd's Restart=always would
respawn each companion within ~10s of removal.

Tests: 53 container tests pass, including 6 quadlet renderer tests
(host network, bridge network, capability set, atomic write idempotence)
and 2 companion specs (per-app companion lookup, build_unit shape).
boot_reconciler tests gain a #[cfg(test)] without_companion_stage()
flag so the paused-clock fixtures don't race the real systemctl I/O.

A bats regression test (companion-survives-archipelago-restart.bats,
gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode
cannot recur: every installed companion has a unit file, services
stay active across systemctl --user restart archipelago, and a
deleted unit file is recreated within one reconcile tick.

Net delta: +941 / -363, but the +941 is mostly tests (~440 lines)
and the new declarative layer; the imperative tokio::spawn block and
its nginx-auth helper are gone, removing two failure classes
(orphan companions on archipelago crash, and post-start exec races
under tightly-confined cgroups) that previously needed manual SSH
recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
archipelago
43de3b73b2 feat(orchestrator): complete container migration and release hardening 2026-04-28 15:00:58 -04:00