Files
archy/docs/RESUME.md

8.4 KiB
Raw Blame History

RESUME — Rust orchestrator migration, Step 8b

Last updated: 2026-04-23 (evening, post-architecture-audit)

Read this first if you're a fresh OpenCode session resuming work. Paste the "Resume prompt" below verbatim.


Resume prompt (paste this into a new opencode session)

We are mid-migration: docs/rust-orchestrator-migration.md + docs/bulletproof-containers.md are the plan, Steps 17 + 8a are shipped on main, Step 8b is next. Read docs/RESUME.md + docs/STEP-8B-PORT-AUDIT.md in full. Do NOT run any container mutations or edit scripts/container-specs.sh, scripts/first-boot-containers.sh, or scripts/reconcile-containers.sh — those are dead code scheduled for deletion in Step 8c. Work happens in core/container/src/manifest.rs, core/archipelago/src/container/prod_orchestrator.rs, and apps/<id>/manifest.yml. Summarize back to me what you understand the current state to be, wait for approval before touching anything.


Standing directive from the user

Please get back to a well architected, minimal as possible, perfect working container architecture. If we've gone off track and the system is getting complex rather than elegant and perfect best containers ever then we need to review all the current state of the system and get back to making the best container system ever and according to our projects goals. We will be working on this until it's perfect.

Interpretation (validated with the user): resume the Rust orchestrator migration. Stop patching bash scripts. The bash scripts were supposed to be deleted three months of commits ago and we drifted into maintaining them by accident.

Latest user comment (must be followed)

please continue, please state my last comment in the resume doc and first before making this plan to adhere to

Adherence rule for this session:

  • Before proposing or executing a plan, first record the user's latest directive in docs/RESUME.md.
  • Keep work aligned to Step 8 migration goals and avoid off-scope drift.

Most recent directive:

And we need to get every container working on .116 and tested before we release

Release gate update:

  • .116 must have all required containers healthy and tested before release is allowed.
  • Treat runtime stabilization on .116 as immediate priority while continuing Step 8 migration work.

Where we actually are

Shipped (Steps 17 + 8a)

Commits on main (unpushed to origin/tx1138 until release gate; user-visible history):

Step Commit What
1 (schema in place from earlier commits) ContainerConfig.imageContainerConfig.build — mutually exclusive pull-or-build source
2 34af4d9d ContainerRuntime trait gains image_exists + build_image; PodmanRuntime impl
3 b6a04d31 ProdContainerOrchestrator with build-or-pull + adoption + reconcile
4 e8a59c93 ContainerOrchestrator trait; RpcHandler uses it in prod
5 fc39b04b BootReconciler — periodic reconcile loop
6 48f08aa3 Wire both into main.rs
7 069bc4a5 bitcoin-ui pre-start hook renders nginx.conf from embedded template (the pattern for "derived config" at apply time)
8a a0707f4d, 1c81a739 Retire archipelago-reconcile systemd timer; split Step 8 into 8a/8b/8c

Three apps/*/manifest.yml are genuinely ported and running under the Rust orchestrator on .116 + .228: bitcoin-ui, electrs-ui, lnd-ui (Step 7).

Where we drifted (the session that produced the previous RESUME.md)

On 2026-04-23 a fedimint outage on .116 pulled a session into patching scripts/reconcile-containers.sh, scripts/container-specs.sh, scripts/first-boot-containers.sh — files that Step 8c is scheduled to delete. Five bugs deep, the user halted the session. That cluster of bugs is a symptom of running two incompatible codepaths in parallel (bash first-boot/reconcile + Rust BootReconciler), which is exactly the condition Step 8c fixes by deleting the bash half.

Discard-of-scope decision: the uncommitted bash edits on .116 (listed in the previous RESUME.md's "Uncommitted script changes" section) are not going to be committed. The fedimint mDNS-URLs fix, the filebrowser custom-args fix, the bcrypt-escape fix — these all land as changes to apps/<id>/manifest.yml + the Rust orchestrator in Steps 8b.0 8b.3. See docs/STEP-8B-PORT-AUDIT.md for the exact mapping.

Current container state on .116

Running but drifted. See the "Current container state" section in the previous RESUME.md. Decision (approved by user): accept .116 is limping until 8b.3 lands. Do not run scripts/reconcile-containers.sh or any mutations; all rescues go through the Rust orchestrator or wait for the manifest port.

.228 is happier — it's already adopted by the Rust orchestrator for the three UI apps.


Next step — Step 8b.0

Concretely: schema extensions to core/container/src/manifest.rs + unit tests. No orchestrator changes, no manifest changes, no container mutations.

Fields to add (justified in docs/STEP-8B-PORT-AUDIT.md§Schema gaps):

  • container.network: Option<String> — podman --network value ("archy-net", "host", or None = isolated default).
  • container.custom_args: Vec<String> — appended to the container command.
  • container.entrypoint: Option<Vec<String>> — override.
  • container.derived_env: Vec<{key, template}> — template strings resolved against HostFacts { host_ip, host_mdns, disk_gb } at apply time.
  • container.secret_env: Vec<{key, secret_file}> — read from /var/lib/archipelago/secrets/<file> at apply time.
  • container.data_uid: Option<String>"NNNNN:NNNNN" applied via chown -R before container create.
  • Volume.volume_type: "tmpfs" + Volume.tmpfs_options: String — OR a new container.tmpfs: Vec<{target, options}>. Pick one at implementation time.

Tests (block the commit until green):

  • Every existing apps/*/manifest.yml still parses (parse_every_real_manifest test).
  • Each new field parses correctly with sensible defaults.
  • validate() rejects: empty custom_args elements, empty entrypoint elements, duplicate derived_env keys, derived_env templates referencing unknown host facts, secret_env with .. or / in secret_file (path-traversal guard).
  • resolve_env(HostFacts) returns expected strings for each supported placeholder.
  • resolve_secret_env(SecretsProvider) returns expected strings; missing secret file is a hard error.

This is the smallest useful commit and unblocks every port in 8b.1+.


Project ground rules (standing)

  • archy SSH alias = .116. archy228 = .228. Do not swap.
  • SSHFS at /Users/dorian/mnt/archy-thinkpad/ = archy:Projects/archy/.
  • .116 sudo password: ThisIsWeb54321@ — works passwordless in-session via sudo -nS after first use.
  • .228 has NOPASSWD.
  • Git commits on .116 MUST use git commit -F /tmp/tmp-msg.txt over ssh archy — SSHFS git commit hangs.
  • Never push except current release (granted: gitea-local + gitea-vps2).
  • No em-dashes. Conventional Commits.
  • No altcoin mentions, Bitcoin-only.

  1. Read this file + docs/STEP-8B-PORT-AUDIT.md + the "Open decisions" section of the audit.
  2. Answer the four open decisions (or confirm the recommended defaults).
  3. Implement 8b.0 commit 1: add network, custom_args, entrypoint, derived_env, secret_env, data_uid fields to ContainerConfig + validation + unit tests. Backwards-compat: every existing apps/*/manifest.yml must still parse.
  4. Commit + cargo test -p archipelago-container + stop.

Do not touch scripts/*.sh. Do not run reconcile-containers.sh. Do not live-test on .116 or .228 until the schema + orchestrator pieces in 8b.0 + 8b.1 are both in.


Recent release (out of scope, for grep context)

v1.7.43-alpha shipped yesterday: tarball-only OTA, async install/uninstall/update lifecycle, install UX polish, .23 VPS retirement. Manifest at gitea-local + gitea-vps2. .228 on the new binary. See docs/STATUS.md for the full rundown.

Earlier session notes (container rescue on .116, "never fails" directive, env-drift detector experiment) are obsolete — superseded by this file. The directive ("never fails") is honored by the Step 8 migration itself: a declarative manifest regenerated on every reconcile tick can't bake stale IPs into consensus data because the env comes from derived/secret sources that are re-resolved every apply.