feat(container): add build source to manifest schema
ContainerConfig.image is now Option<String>, mutually exclusive with a new optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image or build must be present, enforced in AppManifest::validate. Adds ResolvedSource enum (Pull | Build) and ContainerConfig::resolve + ::image_ref helpers so the orchestrator can treat pull and build uniformly. All 26 existing pull-only manifests continue to parse unchanged (covered by existing_pull_only_manifests_still_parse test). Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator. Dev orchestrator errors out cleanly on Build sources until Step 2 lands build_image support on the runtime trait. Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass. Also includes: docs/rust-orchestrator-migration.md (design spec) and docs/STATUS.md resume section for the next session.
This commit is contained in:
191
docs/STATUS.md
Normal file
191
docs/STATUS.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# RESUME HERE — Rust orchestrator migration
|
||||
|
||||
Updated: 2026-04-22 (late session, pivoted from laptop to ThinkPad)
|
||||
|
||||
**To resume this work, SSH into the ThinkPad and run `opencode` from `~/Projects/archy/`.**
|
||||
|
||||
## Where we are
|
||||
|
||||
Working through the 11-step plan in [`rust-orchestrator-migration.md`](./rust-orchestrator-migration.md).
|
||||
|
||||
- [x] **Step 1** — `ContainerConfig` schema extended with `build:` (mutually exclusive with `image:`), new `ResolvedSource` enum, `resolve()` method, 10 new tests
|
||||
- [x] **Step 2** — `ContainerRuntime` trait gained `image_exists` + `build_image` on all three impls (PodmanRuntime, DockerRuntime, AutoRuntime), 4 new argv-construction tests
|
||||
- [ ] **Step 3** — `ProdContainerOrchestrator` (next up)
|
||||
- [ ] Steps 4-11 — see design doc
|
||||
|
||||
## Acceptance evidence
|
||||
|
||||
`cargo test -p archipelago-container --lib` passes 25/25 on the ThinkPad (cargo 1.95.0).
|
||||
|
||||
## Uncommitted state
|
||||
|
||||
The 6 modified files in `git status` ARE the Step 1+2 work:
|
||||
|
||||
```
|
||||
core/archipelago/src/container/dev_orchestrator.rs
|
||||
core/container/src/dependency_resolver.rs
|
||||
core/container/src/lib.rs
|
||||
core/container/src/manifest.rs
|
||||
core/container/src/podman_client.rs
|
||||
core/container/src/runtime.rs
|
||||
```
|
||||
|
||||
Plus `docs/rust-orchestrator-migration.md` (the design spec, untracked).
|
||||
Plus `tests/` (bats harness, uncommitted leftover from prior session).
|
||||
|
||||
## Answered design questions (no need to re-ask)
|
||||
|
||||
1. UI container naming → `archy-<app_id>` for UIs only; existing bitcoin-knots/lnd/electrumx keep bare names
|
||||
2. BITCOIN_RPC_AUTH injection → runtime bind-mount of nginx.conf (no build-args, no envsubst)
|
||||
3. Reconciler interval → 30 seconds
|
||||
4. Concurrency → per-app `Mutex<()>` in a `DashMap`
|
||||
5. Bash scripts → delete immediately (first-boot-containers.sh, reconcile-containers.sh, container-specs.sh, + their systemd units)
|
||||
|
||||
## Context: which host is what
|
||||
|
||||
| Host | IP | Role | Dashboard pw | Sudo pw |
|
||||
|---|---|---|---|---|
|
||||
| `archy` (this one) | 192.168.1.116 | **Dev ThinkPad** (Lenovo X250, Debian 13, archi-thinkpad), also runs v1.7.42-alpha | archipelago | ThisIsWeb54321@ |
|
||||
| `archy228` | 192.168.1.228 | Kiosk HP ProDesk, runs v1.7.41-alpha, missing bitcoin-ui + lnd-ui | password123 | archipelago |
|
||||
|
||||
Both are development alpha nodes — **full destructive latitude**, no need to ask before stop/start/rebuild.
|
||||
|
||||
## Next action
|
||||
|
||||
Step 3: create `core/archipelago/src/container/prod_orchestrator.rs` (new file, ~400 LOC). See the design doc section for "Step 3" for the full public surface + acceptance criteria. Write it, add unit tests against a `MockRuntime`, verify `cargo test -p archipelago` builds.
|
||||
|
||||
---
|
||||
|
||||
# Archipelago — Current State, Plan, and Releases
|
||||
|
||||
Updated: 2026-04-22
|
||||
|
||||
This is the "pick this up tomorrow" page. One-stop summary of where we are, what the plan is, and what's shipped. Detailed plan lives in [`bulletproof-containers.md`](./bulletproof-containers.md).
|
||||
|
||||
---
|
||||
|
||||
## Current state
|
||||
|
||||
### Fleet status
|
||||
|
||||
All four Gitea mirrors are synced to v1.7.40-alpha:
|
||||
|
||||
| Mirror | Host | Status |
|
||||
|---|---|---|
|
||||
| tx1138 | https://git.tx1138.com | ✅ v1.7.40-alpha live |
|
||||
| gitea-local | http://localhost:3000 | ✅ v1.7.40-alpha live |
|
||||
| .160 | http://23.182.128.160:3000 | ✅ v1.7.40-alpha live (Gitea recovered via `podman system renumber` — see below) |
|
||||
| .168 | http://146.59.87.168:3000 | ✅ v1.7.40-alpha live |
|
||||
|
||||
Fleet test nodes:
|
||||
|
||||
| Node | Version | State |
|
||||
|---|---|---|
|
||||
| .103 (dev) | 1.7.40 | running, being developed against |
|
||||
| .116 (this box) | 1.7.40 | healed manually via `systemd-run chmod 755 /opt/archipelago/web-ui` after v1.7.38/39 bug |
|
||||
| .198 | 1.7.39 → 1.7.40-alpha | healed manually |
|
||||
| .228 (primary test) | 1.7.40-alpha | healed manually; bitcoin-core + lnd + electrumx running; UI companions currently missing; bitcoin.conf rpcauth patched live |
|
||||
| .249 (ISO test) | unreachable today | |
|
||||
| .253 | 1.7.39 → 1.7.40-alpha | healed manually |
|
||||
|
||||
### Known open issues (drives the plan below)
|
||||
|
||||
1. **UI companion containers disappear** on .228 after daemon restarts — no auto-recreate (fixed by v1.7.45 Quadlet migration)
|
||||
2. **bitcoin.conf rpcauth drifts** from canonical secret → ElectrumX "Daemon connection problem" (fixed by v1.7.43 reconcile::derived)
|
||||
3. **`host.containers.internal`** resolves to LAN gateway inside containers on some versions (fixed by v1.7.42 containers.conf)
|
||||
4. **Podman state DB loss** requires manual recovery (fixed by v1.7.44 startup self-heal)
|
||||
5. **LND "Connect Wallet" info** vanishing after crashes — symptom of the same drift class as #2
|
||||
6. **ElectrumX not syncing** on .228 — downstream of #2; will resolve when bitcoin.conf is reconciled
|
||||
|
||||
### Recent field incident (2026-04-22)
|
||||
|
||||
- Shipped v1.7.38 + v1.7.39, both broke nginx fleet-wide because the frontend tarball's root dir was `drwx------` (700). Every node that OTA'd got 500 errors on every page.
|
||||
- Root-cause fix shipped in v1.7.40 (`create-release-manifest.sh` chmod + pre-ship assertion that `tar tvzf | head -1` shows `drwxr-xr-x`).
|
||||
- .160 Gitea was down all day (502) because its rootless podman's `libpod/bolt_state.db` had vanished. Recovered via clearing `/run/user/$UID/{containers,libpod,podman}` + `podman system renumber`.
|
||||
- Full failure-mode audit is in [`bulletproof-containers.md`](./bulletproof-containers.md).
|
||||
|
||||
---
|
||||
|
||||
## Plan
|
||||
|
||||
We're shipping a level-triggered **reconciler + Quadlet** architecture over six incremental releases. Each release closes one failure mode. See [`bulletproof-containers.md`](./bulletproof-containers.md) for the full design, code layout, test harness, chaos matrix, sources.
|
||||
|
||||
### Release roadmap
|
||||
|
||||
| Release | Closes | What lands | Status |
|
||||
|---|---|---|---|
|
||||
| **v1.7.41** | FM5 (bad OTA nginx 500) | Post-OTA auto-rollback. New binary probes `https://127.0.0.1/` on boot; if non-200 within 90s, restores `web-ui.bak` + calls `rollback_update()` + restarts | **in flight — deploying to .228 for test** |
|
||||
| **v1.7.42** | FM4 (`host.containers.internal` wrong) | `/etc/containers/containers.conf` w/ `host_containers_internal_ip = 10.89.0.1`; every container gets `--add-host=host.archipelago:10.89.0.1` | pending |
|
||||
| **v1.7.43** | FM2 (config drift) | `reconcile::derived::render_bitcoin_conf` — pure fn over canonical secret, rewrites on drift. Same for `lnd.conf` | pending |
|
||||
| **v1.7.44** | FM6 (podman state loss) | Startup probe detects broken podman state, auto-recovers via `/run/user/$UID/*` clear + `system renumber` | pending |
|
||||
| **v1.7.45** | FM1 + FM3 (companion orphans) | `archy-bitcoin-ui` → Quadlet `.container` unit in `/etc/containers/systemd/`. systemd (not archipelago) owns it | pending |
|
||||
| **v1.7.46** | — | `archy-lnd-ui` → Quadlet | pending |
|
||||
| **v1.7.47** | — | `archy-electrs-ui` → Quadlet | pending |
|
||||
| **v1.7.48+** | all (full daemon refactor) | `core/archipelago/src/reconcile/` module replaces imperative `install.rs` container management. Main app containers become Quadlet too | pending |
|
||||
|
||||
Test harness (bats + Goss + Chaos Toolkit + vmtest) lands scaffold in v1.7.41, first lifecycle tests blocking v1.7.45, full matrix blocking beta tag.
|
||||
|
||||
---
|
||||
|
||||
## Release history
|
||||
|
||||
### [v1.7.41-alpha](/releases/v1.7.41-alpha/) — IN FLIGHT — 2026-04-22
|
||||
**Post-OTA auto-rollback.** After an update lands, the node probes its own web UI through nginx — if the frontend isn't answering cleanly within 90 seconds, the node automatically rolls back to the previous version and restarts. A bad release can no longer leave the fleet stranded on an unreachable node.
|
||||
|
||||
Changes:
|
||||
- `core/archipelago/src/update.rs`: `PendingVerification` struct, write marker before service restart, `verify_pending_update()` on new binary boot — probes `https://127.0.0.1/`, on fail restores `web-ui.bak` + calls `rollback_update()` + `systemctl restart archipelago`
|
||||
- `core/archipelago/src/main.rs`: startup task invokes verifier concurrently with server
|
||||
|
||||
### [v1.7.40-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.40-alpha/) — 2026-04-22
|
||||
**Proper fix for the 500 error.** Fixed the v1.7.38/39 tarball-perms bug at its source — staging dir is now explicitly `chmod 755` before tar; `--mode=u=rwX,go=rX` normalizes archive perms; pre-ship assertion aborts release if `tar tvzf | head -1` isn't `drwxr-xr-x`.
|
||||
|
||||
Changes:
|
||||
- `scripts/create-release-manifest.sh`: pre-tar chmod + tar --mode flag + post-tar verify
|
||||
- Everything from .38 + .39 still in place (onboarding auto-heal, silent logins, app purge, AIUI in tarball)
|
||||
|
||||
### [v1.7.39-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.39-alpha/) — 2026-04-22
|
||||
**Hotfix attempt** for v1.7.38's nginx 500 (didn't fully work — still shipped broken tarball perms). Added startup self-heal chmod in `main.rs` and post-extract chmod in `update.rs` OTA applier.
|
||||
|
||||
### [v1.7.38-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.38-alpha/) — 2026-04-22
|
||||
**Onboarding auto-heal + silent logins + App Store trim.**
|
||||
|
||||
Changes:
|
||||
- `auth.rs`: `is_onboarding_complete()` auto-heals from `setup_complete` + `password_hash` (prevents clear-cache → onboarding wizard bug)
|
||||
- `useOnboarding`: tri-state — backend-unreachable no longer defaults to `/onboarding/intro`
|
||||
- Login sounds gated by `isFirstInstallPhase()` — silent after onboarding, typing sounds unaffected
|
||||
- Removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog + Rust + docker + icons
|
||||
- Deleted 15 image versions from tx1138, .168, gitea-local registries
|
||||
- AIUI baked into release tarball via `demo/aiui/`
|
||||
- `prebuild` hook syncs `app-catalog/catalog.json` → `public/catalog.json`
|
||||
|
||||
(Shipped with tarball-perms bug; fleet had to be healed before v1.7.40.)
|
||||
|
||||
### [v1.7.37-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.37-alpha/) — 2026-04-22
|
||||
**Bitcoin Core install fixes + dynamic node UI + full-archive default.**
|
||||
|
||||
- Bitcoin Core passes explicit `-rpcbind/-rpcallowip/etc.` CLI args so vanilla image exposes RPC
|
||||
- Split `bitcoin-core` from `bitcoin-knots` in backend `AppMetadata`
|
||||
- bitcoin-ui auto-detects Core vs. Knots from subversion, swaps branding at runtime
|
||||
- Storage (Full Archive · X GB / Pruned) indicator on dashboard
|
||||
- Node Settings modal shows real values (network, storage, txindex, ZMQ, RPC port)
|
||||
- Pull fallback to `docker.io` when no mirror carries the image
|
||||
- Removed `prune=550` hardcode — full archive default
|
||||
|
||||
---
|
||||
|
||||
## Key docs
|
||||
|
||||
- [`bulletproof-containers.md`](./bulletproof-containers.md) — full reconcile architecture, code layout, test matrix, chaos scenarios, sources
|
||||
- [`BETA-RELEASE-CHECKLIST.md`](./BETA-RELEASE-CHECKLIST.md) — existing beta checklist
|
||||
- [`BETA-ISSUES-20260328.md`](./BETA-ISSUES-20260328.md) — prior beta-blocker tracking
|
||||
- [`hotfix-process.md`](./hotfix-process.md) — release workflow
|
||||
- [`architecture.md`](./architecture.md) — system architecture overview
|
||||
|
||||
---
|
||||
|
||||
## How to resume
|
||||
|
||||
1. Check fleet mirrors are all live: `curl -sS https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/manifest.json | jq .version`
|
||||
2. Read [`bulletproof-containers.md`](./bulletproof-containers.md) for the current plan
|
||||
3. Check task list (`/list` or via Claude Code) for the in-flight release
|
||||
4. Latest in-flight work: v1.7.41 deploying to .228 for test; will ship to all 4 mirrors once verified
|
||||
Reference in New Issue
Block a user