docs: STATUS.md through Step 9 (.228 hot-swap verified)
Logs Step 9 acceptance evidence, the two bugs caught and fixed during the hot-swap (parse_memory_limit IEC suffix bug in732df1b8and cgroup Delegate inba83f9bc), and outlines the Step 10 plan for .116.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# RESUME HERE — Rust orchestrator migration
|
||||
|
||||
Updated: 2026-04-23 (Step 7 committed, moving to Step 8)
|
||||
Updated: 2026-04-23 (Step 9 complete on .228, Step 10 next)
|
||||
|
||||
**To resume this work, SSH into the ThinkPad and run `opencode` from `~/Projects/archy/`. Or work from the laptop via the SSHFS mount at `~/mnt/archy-thinkpad/`.**
|
||||
|
||||
@@ -13,26 +13,50 @@ Working through the 11-step plan in [`rust-orchestrator-migration.md`](./rust-or
|
||||
- [x] **Step 3** — `b6a04d31` ProdContainerOrchestrator (999 LOC), 16 tests all pass, not yet wired to main.rs
|
||||
- [x] **Step 4** — `e8a59c93` ContainerOrchestrator trait, RpcHandler uses it in prod (+ `13858842` chore gitignore ._*)
|
||||
- [x] **Step 5** — `fc39b04b` BootReconciler with Arc<Notify> shutdown, 4 paused-time tests pass
|
||||
- [x] **Step 6** — main.rs wire-up: construct orchestrator once, load_manifests + adopt_existing + spawn BootReconciler, thread through Server::new / ApiHandler::new / RpcHandler::new, wire shutdown Notify to SIGTERM/SIGINT. Clean `cargo check -p archipelago` (6 pre-existing warnings), container tests 43/44 pass (the one failing `test_parse_image_versions` is pre-existing and unrelated — asserts `!contains_key("NOT_AN_IMAGE")` but the retain on line 106 keeps anything ending in `_IMAGE`).
|
||||
- [x] **Step 7** — `069bc4a5` bitcoin-ui pre-start hook renders nginx.conf from embedded template. New `container::bitcoin_ui` module (render fn, atomic tmp+rename, idempotent byte-compare, 8 unit tests). `ProdContainerOrchestrator::run_pre_start_hooks` fires in `install_fresh` before `create_container` and in `ensure_running` (Running+Rewritten → restart; Stopped → re-render+start). bitcoin-ui Dockerfile no longer COPYs nginx conf; arrives via runtime bind-mount (safe-failure → 404 if missing, never stale auth). `apps/{bitcoin,electrs,lnd}-ui/manifest.yml` land. Integration test asserts `install("bitcoin-ui")` writes substituted config to disk. 39/39 container:: tests pass (same 1 pre-existing failure).
|
||||
- [ ] **Step 8a** — Delete `archipelago-reconcile.{service,timer}` + ISO builder touchpoints. Keep `reconcile-containers.sh` + `container-specs.sh` for `update.rs` OTA path. Next up.
|
||||
- [x] **Step 6** — `48f08aa3` main.rs wire-up (orchestrator construction + adopt_existing + BootReconciler spawn + shutdown Notify)
|
||||
- [x] **Step 7** — `069bc4a5` bitcoin-ui pre-start hook + embedded nginx.conf template (8 unit tests + 1 integration test), 39/39 container:: tests pass
|
||||
- [x] **Step 8a** — `a0707f4d` retire archipelago-reconcile.{service,timer} + ISO builder touchpoints, keep scripts for update.rs
|
||||
- [x] **Step 9** — **Hot-swap on .228 verified.** All three UIs (bitcoin-ui/lnd-ui/electrs-ui) installing + serving HTTP 200. Adoption + reconciler + pre-start hook + dependency ordering all working under the prod code path. See "Step 9 evidence" below.
|
||||
- [ ] **Step 8b** — Port remaining ~25 container creations from `first-boot-containers.sh` into `apps/<id>/manifest.yml`, then port `update.rs` to orchestrator (deferred, multi-day work)
|
||||
- [ ] **Step 8c** — Rename `first-boot-containers.sh` → `first-boot-setup.sh`, strip container ops, keep setup. Delete `reconcile-containers.sh` + `container-specs.sh`. Add ISO lines to copy `apps/` (final one-way door, requires 8b complete)
|
||||
- [ ] **Step 9** — Hot-swap + verify on .228
|
||||
- [ ] **Step 10** — Hot-swap + verify on .116
|
||||
- [ ] **Step 10** — Hot-swap + verify on .116 (adoption-heavy test — .116 already has all containers running)
|
||||
- [ ] **Step 11** — Chaos matrix on both nodes
|
||||
|
||||
## Acceptance evidence (Steps 1–7)
|
||||
## Step 9 evidence (.228, 2026-04-23)
|
||||
|
||||
`cargo test -p archipelago-container --lib` → 25/25 pass.
|
||||
`cargo test -p archipelago container::` → 38/39 pass (all container:: tests; the 1 failure is pre-existing `test_parse_image_versions` — assert bug against `_IMAGE` suffix filter).
|
||||
`cargo check -p archipelago` → clean, 6 warnings (dead-code on trait methods not yet exercised — expected until Step 9 hot-swap).
|
||||
- Binary: `fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes` (`732df1b8`) + `feat(systemd): delegate cgroup controllers` (`ba83f9bc`), built on .116, scp'd to .228 as `/usr/local/bin/archipelago`. Old binary backed up at `/usr/local/bin/archipelago.bak-pre-step9`.
|
||||
- DEV_MODE override disabled (`override.conf` → `override.conf.disabled-pre-step9`).
|
||||
- `/opt/archipelago/apps/{bitcoin-ui,electrs-ui,lnd-ui}/manifest.yml` populated.
|
||||
- `/opt/archipelago/docker/bitcoin-ui/Dockerfile` replaced with the Step 7 version (no `COPY nginx.conf`). Old dir backed up as `bitcoin-ui.bak-pre-step9`.
|
||||
- Post-start snapshot:
|
||||
- `🔗 Adopted 1 existing container(s): ["electrs-ui"]` — adoption of 13h-running container worked without recreation
|
||||
- `🔄 Boot reconciler started (interval: 30s)` — every 30s, all three app_ids reach `NoOp` after the initial install pass
|
||||
- `bitcoin-ui nginx.conf rendered path=/var/lib/archipelago/bitcoin-ui/nginx.conf auth_hash=97af1c18` — pre-start hook fires in `install_fresh`
|
||||
- `curl localhost:8334` → HTTP 200 (bitcoin-ui), `:8081` → 200 (lnd-ui), `:50002` → 200 (electrs-ui)
|
||||
- OCI memory limits correctly applied: bitcoin-ui=128Mi, electrs-ui=128Mi, lnd-ui=64Mi (was emitted as 0 pre-fix)
|
||||
- bitcoin-core / filebrowser / lnd / electrumx continue running untouched (prod orchestrator currently only manages apps it has manifests for; Step 8b expands that scope).
|
||||
|
||||
Unrelated test failures (identity_manager / session / wallet / mesh / credentials): 24 pre-existing on baseline `b6a04d31`, fluctuates to 25 on Step 4 — confirmed unrelated (diff only shifted 3 fs-state tests that are independently flaky).
|
||||
## Two bugs found & fixed during Step 9
|
||||
|
||||
1. **`parse_memory_limit` truncation bug** (`732df1b8`): lowercased "128Mi" → "128mi" → `trim_end_matches('m')` → "128i" → f64 parse fails → `None.unwrap_or(0)` → OCI `memory.limit:0` → systemd rejects MemoryMax=0 at container start. Every manifest in `apps/` uses IEC suffixes so every ProdContainerOrchestrator install was DOA. Now handles Ki/Mi/Gi/Ti + SI decimal + shorthand + raw bytes; 6 regression tests. Also changed `create_container` to OMIT the memory/cpu fields on absent/unparseable input rather than emitting 0.
|
||||
2. **`archipelago.service` cgroup delegation missing** (`ba83f9bc`): not the root cause of Step 9 failures (bug #1 was), but belt-and-braces so future code paths using the `--memory` CLI arg (runtime.rs DockerRuntime) don't hit the same systemd rejection on hosts where podman needs to create libpod scopes inside a non-delegated system-slice subtree. Added `Delegate=memory pids cpu io`.
|
||||
|
||||
## Commits made this session
|
||||
|
||||
```
|
||||
ba83f9bc feat(systemd): delegate cgroup controllers to archipelago.service
|
||||
732df1b8 fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes
|
||||
a0707f4d refactor: retire archipelago-reconcile.{service,timer} (Step 8a)
|
||||
1c81a739 docs: split Step 8 into 8a/8b/8c
|
||||
6e46932f docs: STATUS.md through Step 7
|
||||
069bc4a5 feat: bitcoin-ui pre-start hook (Step 7)
|
||||
```
|
||||
|
||||
Branch is **17 commits ahead of tx1138/main** (local only — user pushes to mirrors personally).
|
||||
|
||||
## Uncommitted state
|
||||
|
||||
Clean — only leftover is `tests/` (bats harness from prior session, not in scope for this migration).
|
||||
Clean. Only untracked: `tests/` (bats harness from prior session, not in scope).
|
||||
|
||||
## Answered design questions (no need to re-ask)
|
||||
|
||||
@@ -40,7 +64,7 @@ Clean — only leftover is `tests/` (bats harness from prior session, not in sco
|
||||
2. BITCOIN_RPC_AUTH injection → runtime bind-mount of nginx.conf (no build-args, no envsubst)
|
||||
3. Reconciler interval → 30 seconds
|
||||
4. Concurrency → per-app `Mutex<()>` in a `DashMap`
|
||||
5. Bash scripts → delete immediately (first-boot-containers.sh, reconcile-containers.sh, container-specs.sh, + their systemd units)
|
||||
5. Bash scripts → split into 8a/8b/8c; 8a done, 8b/8c deferred
|
||||
6. Step 4 extension → `ContainerOrchestrator` trait includes `install(app_id)`; the `manifest_path`-based install RPC stays dev-only
|
||||
7. Step 7 bitcoin-ui template → embed via `include_str!`, render on install + every reconcile, atomic tmp+rename to `/var/lib/archipelago/bitcoin-ui/nginx.conf`, bind-mount into container. RPC user hardcoded `archipelago`, password from `/var/lib/archipelago/secrets/bitcoin-rpc-password`.
|
||||
|
||||
@@ -48,29 +72,31 @@ Clean — only leftover is `tests/` (bats harness from prior session, not in sco
|
||||
|
||||
| Host | IP | Role | Dashboard pw | Sudo pw |
|
||||
|---|---|---|---|---|
|
||||
| `archy` (this one) | 192.168.1.116 | **Dev ThinkPad** (Lenovo X250, Debian 13, archi-thinkpad), also runs v1.7.42-alpha | archipelago | ThisIsWeb54321@ |
|
||||
| `archy228` | 192.168.1.228 | Kiosk HP ProDesk, runs v1.7.41-alpha, missing bitcoin-ui + lnd-ui | password123 | archipelago |
|
||||
| `archy` | 192.168.1.116 | **Dev ThinkPad** (Lenovo X250, Debian 13). Currently running v1.7.42-alpha (DEV_MODE). Step 10 target. | archipelago | ThisIsWeb54321@ |
|
||||
| `archy228` | 192.168.1.228 | Kiosk HP ProDesk. **Step 9 landing zone** — now running Rust-orchestrator binary in prod mode. | password123 | archipelago |
|
||||
|
||||
Both are development alpha nodes — **full destructive latitude**, no need to ask before stop/start/rebuild.
|
||||
|
||||
## Next action
|
||||
|
||||
**Step 8a — Delete the reconcile systemd timer path.** Safe, isolated, atomic.
|
||||
**Step 10 — Hot-swap on .116.**
|
||||
|
||||
Files to delete:
|
||||
1. `image-recipe/configs/archipelago-reconcile.service` (14 LOC — replaced by BootReconciler)
|
||||
2. `image-recipe/configs/archipelago-reconcile.timer` (14 LOC — replaced by BootReconciler)
|
||||
Unlike .228 (which tested the INSTALL path for net-new UI containers), .116 tests the ADOPTION path: it already has all three UIs and all backend containers running from prior v1.7.42-alpha runs. We want to verify the new prod orchestrator adopts every existing container without recreating or restarting them.
|
||||
|
||||
ISO builder edits in `image-recipe/build-auto-installer-iso.sh`:
|
||||
- L412-413: drop `COPY archipelago-reconcile.{service,timer}`
|
||||
- L449: drop `systemctl enable archipelago-reconcile.timer`
|
||||
- L542-543: drop the `cp archipelago-reconcile.{service,timer}` block
|
||||
Steps:
|
||||
1. Disable DEV_MODE on .116 (check if override.conf exists — `/etc/systemd/system/archipelago.service.d/`)
|
||||
2. Stage the already-built binary at `~/Projects/archy/core/target/release/archipelago` → `/usr/local/bin/archipelago.new`
|
||||
3. Ensure `/opt/archipelago/apps/{bitcoin-ui,electrs-ui,lnd-ui}/manifest.yml` present (copy from repo)
|
||||
4. Ensure `/opt/archipelago/docker/bitcoin-ui/` matches the Step-7 layout (no baked nginx.conf)
|
||||
5. Snapshot: `podman ps -a --format "{{.Names}}\t{{.Status}}\t{{.CreatedAt}}"` → save to `/tmp/pre-step10-containers.txt`
|
||||
6. `systemctl stop archipelago` → install binary → `systemctl start archipelago`
|
||||
7. Verify in journal: every running container appears in "Adopted N existing container(s)"; no container was recreated; all HTTP smokes still 200; BootReconciler reaches NoOp on every app_id after one pass.
|
||||
8. If broken → restore `.bak` binary, re-enable DEV_MODE override.
|
||||
9. Commit STATUS.md update.
|
||||
|
||||
**Keep** `scripts/reconcile-containers.sh` + `scripts/container-specs.sh` because `core/archipelago/src/api/rpc/package/update.rs` still shells out to reconcile-containers.sh during OTA updates. Porting update.rs to `ContainerOrchestrator::upgrade()` requires manifests for every container it touches — that's Step 8b's scope.
|
||||
**Risk on .116:** If adoption fails mid-flight, we'd lose the running v1.7.42 backend that I'm currently typing at. Keep a second SSH session open to the ThinkPad for emergency revert. The backup plan is `install /usr/local/bin/archipelago.bak /usr/local/bin/archipelago && systemctl restart archipelago`.
|
||||
|
||||
No Rust changes. Atomic single commit. Full ISO build test on .116 before commit per user ask.
|
||||
|
||||
**Step 8b/8c come later** — they require porting 25+ container creations from `first-boot-containers.sh` into `apps/*/manifest.yml`, which is a multi-day scope. Not tonight.
|
||||
**After Step 10 we are blocked on Step 8b** (multi-day manifest ports) before Step 11 (chaos matrix).
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user