Commit Graph

987 Commits

Author SHA1 Message Date
archipelago
008da4776d docs(changelog): add v1.7.43-alpha entry covering async lifecycle + .23 retirement
Four release-note bullets describing the user-visible changes shipped
in this round:

- async-spawn install/update/uninstall (UI no longer freezes)
- phase-based install progress bar (Preparing through Finalizing)
- scanner kick post-install (Launch button appears immediately)
- .23 Hetzner VPS retired, .168 OVH promoted to Server 1 with
  auto-purge migration for existing nodes

Matches the tone of existing changelog entries: what changed from the
operator's perspective, not internal implementation detail.
2026-04-23 09:07:29 -04:00
archipelago
0ee1682037 fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs
load_registries + load_mirrors normally only ADD missing defaults to
the persisted JSON — explicit removals stick. After retiring the .23
Hetzner VPS we need the opposite: existing nodes have .23 baked into
their saved configs and would spend seconds per install/update timing
out against a dead host until the operator manually removes it via
the Settings UI.

Add a targeted one-time migration in both loaders: if any saved entry
has 23.182.128.160 in its URL, drop it on load and rewrite the file.
This is an exception to the usual "explicit removals stick" rule —
the user never chose to add this mirror, it was a default.

Narrow-scope migration (one hardcoded IP match, no schema version)
because the cost/benefit of a general migration system isn't worth
it for a single decommissioned host. Future retirements can follow
the same pattern.
2026-04-23 08:51:26 -04:00
archipelago
2205232548 chore: retire .23 VPS mirror, promote .168 OVH to primary
The Hetzner VPS at 23.182.128.160 was decommissioned. Replace it
everywhere with the OVH VPS at 146.59.87.168, which was previously
the tertiary mirror.

  - update.rs: drop DEFAULT_TERTIARY_MIRROR_URL, promote .168 into
    the secondary slot as "Server 1 (OVH)"; tx1138 becomes Server 2.
    Default mirror list shrinks from 3 to 2.
  - container/registry.rs: default RegistryConfig drops .23, promotes
    .168 to Server 1 / priority 0, tx1138 stays Server 2 / priority 10.
  - api/rpc/package/config.rs: trusted-registry allowlist swaps .23
    for .168.
  - api/handler/mod.rs: app-catalog fallback URL uses .168.
  - neode-ui/views/marketplace/marketplaceData.ts: REGISTRY uses .168.
  - scripts/image-versions.sh: ARCHY_REGISTRY_FALLBACK uses .168.
  - image-recipe/build-auto-installer-iso.sh: installer ISO registries
    use .168 (both podman registries.conf and backend registries.json).

Tests updated to assert on the new 2-entry default lists (registry +
mirror). URL-parser fixture tests in update.rs retain .23 strings —
they exercise string-parsing logic, not mirror policy.

Git remotes: dropped `gitea-vps` and the .23 push URL on the `origin`
multi-push alias (not part of this commit — pure working-copy change).
2026-04-23 08:22:32 -04:00
archipelago
f86d86c354 fix(install): kick scanner post-install so Launch button appears immediately
After install completes, the async-spawn wrapper wrote state=Running
but the skeletal install-time manifest (interfaces: None) persisted
until the next scheduled 60s scan. The frontend saw state=running but
hasUI=false and hid the Launch button for up to a full minute.

Add a shared Notify/watch pair between RpcHandler and the scan loop:
  - scan_kick (Notify): scan loop selects! between the 60s interval
    and this notify, running immediately on either.
  - scan_tick (watch<u64>): scan loop bumps the counter after each
    completed scan so callers can await completion.

Install and update success paths now call kick_scanner_and_wait before
flipping to Running. The scan merges via merge_preserving_transitional
(state stays Installing/Updating, manifest refreshed from live podman
with interfaces.main.ui populated from real port bindings). 2s timeout
falls back to pre-fix behavior on slow podman — no regression.
2026-04-23 07:59:03 -04:00
archipelago
8cc84ebcb7 feat(install): phase-based progress bar replaces unparseable pull bytes
Podman emits zero parseable progress when stderr is piped (no TTY), so
the old byte-counter regex never matched in real installs. Users saw
0% for the whole pull, then a jump to 95%, then silence through
create-container, health-check, and post-install hooks.

Replace with 7 explicit lifecycle phases wired through install.rs and
update.rs: Preparing (5%), PullingImage (20%), CreatingContainer (70%),
StartingContainer (80%), WaitingHealthy (88%), PostInstall (95%),
Done (100%). Each maps to a fixed UI progress and status message.

Frontend PHASE_INFO mapper in stores/server.ts prioritizes phase when
present, falls back to byte-counter for legacy. A Math.max forward-only
guard ensures the bar never regresses. Deleted the duplicate watcher
in Discover.vue that was fighting the store's watcher with stale byte
logic. Added shimmer CSS on the fill (with prefers-reduced-motion
opt-out) so the bar looks alive during long phases.
2026-04-23 07:58:43 -04:00
archipelago
b2cc7e09d6 docs(status): mark install/uninstall/update async-spawn as shipped 2026-04-23 06:58:45 -04:00
archipelago
e471ef754e fix(rpc): empty icon in transient install entry to avoid broken-image flicker
create_installing_entry hardcoded /assets/img/app-icons/<id>.png for
every new install. About half the app icons ship as .svg or .webp
(lnd.svg, vaultwarden.webp, bitcoin-knots.webp, mempool.webp), so the
browser 404s on the wrong extension and renders the default broken-image
glyph for the 10-30s window before the scanner refreshes with real
manifest data.

Send empty icon. The frontend's icon computed in AppCard.vue falls
through to curatedMap which has correct extensions for bundled apps,
and handleImageError still guards any remaining misses with a
placeholder SVG.
2026-04-23 06:58:12 -04:00
archipelago
0733ac4034 fix(ui): shorten install/uninstall/update timeouts for async RPCs
With the backend flipped to async-spawn, install/uninstall/update return
immediately with a { status, package_id } envelope. Client timeouts of
45m/11m were a leftover from synchronous handlers and masked real RPC
failures.

Drop all install/uninstall/update RPC timeouts to 15s. Progress and
terminal state still arrive through the live state stream — the RPC
only needs to confirm the spawn was accepted.

Return-type annotations updated in rpc-client.ts and stores/server.ts.
Five direct rpcClient.call sites across Marketplace.vue, Discover.vue,
and MarketplaceAppDetails.vue updated with the shorter timeout.
2026-04-23 06:58:02 -04:00
archipelago
2d5b859e18 feat(rpc): async-spawn install/uninstall/update lifecycle
Extend the async-spawn treatment previously shipped for Stop/Start/Restart
to the three remaining long-running lifecycle RPCs. Each wrapper validates
params, rejects duplicate in-flight ops, flips state to the transitional
variant (Installing/Removing/Updating), then spawns the existing inner
handler on tokio. RPC returns immediately with { status, package_id }; the
spawn task owns the terminal state write.

Install and update success arms explicitly set state=Running. The scan
loop merge (merge_preserving_transitional) refuses to overwrite
transitional states, so the spawn task must write the terminal state.
Uninstall's inner handler removes the entry entirely, so no explicit
terminal write is needed there.

Dispatcher and handler now thread self as Arc<Self> / &Arc<Self> so
spawned tasks can hold their own Arc without extra field cloning.

Transient install entry uses empty icon string. Hardcoding
/assets/img/app-icons/<id>.png 404s for apps that ship .svg or .webp
assets, which produces a broken-image flicker until the scanner refreshes
with manifest data. Empty string causes the frontend's icon computed to
fall through to the curated map, which has correct extensions.

Removed the inner "already updating" guard in update.rs — the wrapper
now owns duplicate-op detection for all three operations.
2026-04-23 06:57:50 -04:00
archipelago
4f279388a1 docs(status): mark async-spawn lifecycle fix as shipped
Records the four landed commits, the .228 deploy (binary + frontend
paths, backups, md5), the manual LND Stop verification, and the
rollback incantation. Leaves the older "NEXT SESSION" design block
in place as historical reference with a note that it's stale.

Adds a follow-ups list: chaos matrix is now unblocked, bundled-app
RPCs are still sync (deprecate or mirror-async?), transitional_since
is in-memory only, and there are 22 pre-existing test failures in
unrelated modules that should get their own cleanup pass.
2026-04-23 05:30:45 -04:00
archipelago
9ce28f080e fix(ui): single-button lifecycle control with transitional labels
The app card and details view previously used a pair of Start/Stop
buttons whose labels were driven off isAppLoading(), a client-side
"I just clicked the button" flag. When the backend's graceful stop
took longer than the RPC round-trip (up to 600s on bitcoin-core),
the flag cleared while the container was still shutting down, the
UI flipped back to "Running" as soon as the next 10s scan saw the
still-alive container, and the user had no indication the stop was
still in flight.

Now that the backend flips PackageState to Stopping / Starting /
Restarting / Installing / Updating / Removing for the duration of
each lifecycle operation and the scan loop preserves those states,
the UI can drive its label off the container state itself. A single
full-width primary button replaces the Start/Stop pair. Its label,
color, and disabled state come from getAppVisualState(), which
collapses resting states (exited/created/paused/installed) into
"stopped" and passes transitional states through untouched.

Changes:

- container-client.ts: widen ContainerStatus.state union to include
  the six transitional variants plus "installed". Add
  restartContainer() calling the new container-restart RPC.
- stores/container.ts: add getAppVisualState() computed and the
  restartContainer() action.
- ContainerApps.vue: single primary button (Start / Stop / Starting
  / Stopping / Restarting etc.) plus a separate circular Restart
  button visible only when running. Critically, handleStartApp and
  handleStopApp now route through store.startContainer and
  stopContainer (which call container-start / container-stop, the
  async RPCs) instead of the legacy synchronous bundled-app-start /
  bundled-app-stop path. Transitional-state polling widened from
  just "created" to the full set of transitional variants.
- ContainerAppDetails.vue: same single-button pattern, Restart
  button now calls container-restart instead of the old
  stop-sleep-start sequence, added 2s polling interval for
  transitional states.
- components/ContainerStatus.vue: widen state prop to match the
  shared union, render transitional labels with a trailing ellipsis
  and a yellow dot.

No new tests — this is presentation logic. Manual verification on
.228 will confirm the end-to-end async path: click Stop on LND,
button becomes "Stopping" in under a second, stays that way for
roughly 5 minutes, then flips to "Start" with a grey dot. The UI
must never revert to "Running" mid-stop.
2026-04-23 05:20:15 -04:00
archipelago
6712810b92 fix(state): preserve transitional state across container scans
The 30s package scan loop used to blindly overwrite every package
entry from podman inspect. While a user-initiated Stop / Start /
Restart was in flight, the RPC spawn task would flip the state to
Stopping / Starting / Restarting, the next scan would see podman
still reporting "running" (for the duration of the graceful stop,
up to 600s for bitcoin-core), and clobber the transitional state
back to Running. The dashboard would then flip Running -> Stopping
-> Running -> Stopped, making it look like the stop had silently
failed until it eventually completed.

The merge loop now treats transitional variants (Stopping, Starting,
Restarting, Installing, Updating, Removing, and the three backup
variants) as owned by the RPC spawn task. For those variants,
merge_preserving_transitional keeps the existing state while still
taking live observability fields (health, exit_code, installed,
lan_address, manifest, static_files, available_update) from the
fresh scan so the UI continues to see live health readings.

Adds an escape hatch via a per-scan transitional_since side table:
if a package has been in a transitional state for more than 1200s
(2x the longest graceful stop at 600s on bitcoin-core), the scan
loop assumes the spawn task died without cleanup and overrides with
podman's live state. Prevents a crashed background task from wedging
a package in Stopping forever.

Three unit tests cover the merge rule, the observability passthrough,
and the transitional-variant classifier.
2026-04-23 05:15:13 -04:00
archipelago
19a99ca993 fix(rpc): async container stop/start/restart; widen state mapping
RPC handlers no longer block on podman operations. container-stop on
bitcoin-core used to hold the connection for up to 600s while the UI
showed a frozen spinner; it now returns in under a second with
{status: stopping} after flipping the package state to Stopping and
broadcasting over WebSocket. Same treatment for container-start and
the new container-restart route.

Widens container-list state mapping to emit the transitional variants
(stopping, starting, restarting, installing, updating, removing,
installed, and the backup states) instead of collapsing them to
"unknown". Keeps the mapping in sync with the UI ContainerStatus.state
union so the dashboard can render the right transitional label.

Mirrors the treatment in package/runtime.rs for package.start,
package.stop, and package.restart. The body of each handler is lifted
into pure do_package_* helpers that the background task runs; state
flipping is bracketed around the spawn with revert on error. The
pre-existing post-start exit-check verification and restart stop+start
fallback run inside the spawned task, not the RPC body.

Adds container-restart route to the dispatcher. mark_user_stopped
continues to run BEFORE the spawn, preserving the ordering contract
with the crash recovery layer at runtime.rs:145-148.
2026-04-23 04:59:45 -04:00
archipelago
44cd5eefdf feat(rpc): spawn_transitional helper for async lifecycle ops
Introduces a new RPC-layer helper that bridges the synchronous
ContainerOrchestrator trait with RPC handlers that must return in <1s.

The helper flips the package state to a transitional variant
(Stopping / Starting / Restarting) in the StateManager so WebSocket
clients see the live label immediately, then tokio::spawns the
actual orchestrator call. On success it writes the final state; on
error it reverts to the pre-transition state and logs via
install_log().

The ContainerOrchestrator trait stays synchronous so the reconciler,
boot flow, unit tests, and chaos harness keep deterministic
behaviour. Async only lives in the RPC layer.

Not wired to any handler yet — Commit 2 consumes this helper.
Widens install_log visibility from pub(super) to
pub(in crate::api::rpc) so the new sibling module can reach it.
2026-04-23 04:55:52 -04:00
archipelago
f721ecf39b docs: STATUS.md — FUSE/SSHFS development loop section
Dedicated section covering the file-ops-via-mount + git/cargo-via-ssh
split that makes this dev setup work. Includes:

- Exact running mount command (pulled from ps)
- macFUSE + sshfs-mac brew install path
- Health check + recovery sequence for when mount hangs (it will)
- Full which-path-for-which-operation table
- Don't-do list (cargo from mount, rsync without AppleDouble exclude, etc)
- Cache caveat and inode-sharing note between mount and SSH views

No code change.
2026-04-23 04:51:53 -04:00
archipelago
120a307343 docs: STATUS.md — complete SSH/key/sudo/deploy reference for next session
Expands NEXT SESSION header with fully verified access info so a fresh
agent has zero ambiguity:

- SSH key inventory across laptop, .116, .228 (every file, purpose noted)
- Actual SSH config aliases (archy, archy228) with IdentitiesOnly
- Verified connectivity matrix (laptop -> both; .116 -> .228; .228 has no outbound key)
- Corrected sudo state: .228 sudoers file is /etc/sudoers.d/archipelago
  (not archipelago-ci); .116 has archipelago-ci + archipelago-wg scope-limited drop-ins
- SSHFS mount source command + AppleDouble gotcha
- Cargo over SSH PATH gotcha + detached build pattern for >2min timeout
- End-to-end deploy-to-.228 recipe (build, SCP, atomic swap, verify)
- Git workflow rules (no push, no amend, no force, conventional commits)

Removes duplicate host-reference block that the prior edit left trailing.
No code change.
2026-04-23 04:49:45 -04:00
archipelago
e557e0156f docs: STATUS.md — dashboard Stop UX bug diagnosis + async-spawn fix plan
Captures full design for the next session:
- Full bug sequence (5.5min blocking RPC + 30s scan clobbering transitional state)
- 4-commit implementation order with exact file:line targets
- Single-button UI spec with full label table
- Verification gates including manual LND stop test on .228
- Architectural decision: spawn lives in RPC layer, orchestrator trait stays sync

No code change yet; next session implements.
2026-04-23 04:45:12 -04:00
archipelago
1ab66f33a3 docs: STATUS.md — .228 dashboard bugs fixed (macaroon + ExtraHost) 2026-04-23 04:17:56 -04:00
archipelago
3ee192ba1f fix(first-boot): use podman host-gateway magic for host.containers.internal
The previous code computed HOST_GATEWAY from `ip route show default` to
work around an alleged podman 4.3.x limitation. Two problems:

1. The comment was wrong. Podman 4.4+ supports --add-host=host-gateway
   natively, and we ship 5.4.2.

2. More critically, `ip route show default` returns the LAN router
   (e.g. 192.168.1.254) — the gateway to the internet, not the gateway
   to the host. Every container configured with DAEMON_URL or
   --bitcoind.rpchost=host.containers.internal was therefore dialing
   the WiFi router instead of the host machine, silently failing.

Symptoms this caused on .228:
- LND crash-looped with "dial tcp 192.168.1.254:8332: connection refused"
- Dashboard showed no LND connect details or QR
- ElectrumX DAEMON_URL broken; stuck at 2 KB index for days
- Any service reaching bitcoin-core through the `archy-net` bridge

Replace the computed value with the literal string "host-gateway",
which podman translates to the correct in-network gateway at container
start. Also drop the stale HOST_GATEWAY reference in the Tor-bootstrap
branch (it always fell back to TARGET_IP anyway). Verified on .228:
after recreating bitcoin-core/electrumx/lnd with the new flag, LND
reached the chain backend, ElectrumX resumed indexing, and the
dashboard /lnd-connect-info endpoint succeeded.
2026-04-23 04:16:42 -04:00
archipelago
be96002372 fix(lnd): read admin macaroon via sudo fallback
LND's admin.macaroon is owned by a rootless-podman subordinate UID
(typically 100000) with mode 640. The archipelago server runs as UID
1000 and cannot read the file directly, which caused every dashboard
LND RPC (getinfo, connect-info, export-channel-backup) and lnd_client
to fail with "Failed to read LND admin macaroon".

Add a read_lnd_admin_macaroon() helper that first tries a direct read
(for operators who have relaxed permissions) then falls back to
`sudo -n cat`, mirroring the pattern already used for Tor hidden
service hostnames in handle_lnd_connect_info. Centralise the canonical
macaroon path as LND_ADMIN_MACAROON_PATH and route all four callers
through the helper.

Verified on .228: GET /lnd-connect-info now returns 200 with cert,
macaroon, and tor_onion fields. Dashboard QR/connect-string UI
unblocked.
2026-04-23 04:15:44 -04:00
archipelago
4b8ef0a098 docs: STATUS.md through Step 9 (.228 hot-swap verified)
Logs Step 9 acceptance evidence, the two bugs caught and fixed during
the hot-swap (parse_memory_limit IEC suffix bug in 732df1b8 and
cgroup Delegate in ba83f9bc), and outlines the Step 10 plan for .116.
2026-04-23 03:46:23 -04:00
archipelago
ba83f9bce2 feat(systemd): delegate cgroup controllers to archipelago.service
Adds Delegate=memory pids cpu io to the archipelago.service unit.

Context: the service runs as User=archipelago under system.slice with
rootless podman. When podman creates transient libpod-*.scope units for
containers under user.slice, systemd needs the caller to hold
CAP_SYS_ADMIN on the target cgroup subtree \u2014 which happens iff
Delegate= lists the controllers we want to set. Without Delegate, any
future code path that goes through the podman CLI (runtime.rs) instead
of the libpod HTTP API (podman_client.rs) would hit MemoryMax
rejections that have exactly the same symptom as the bug I just fixed
in parse_memory_limit but with a completely different root cause.

Belt-and-braces: current production path uses PodmanClient and was
fixed in the preceding commit. But the DockerRuntime CLI path in
runtime.rs:262-268 (cmd.arg("--memory")) is still reachable via
AutoRuntime fallback on hosts without podman, and future rust
orchestrator code may legitimately need cgroup delegation. This
directive is no-op harmful on hosts that already delegate upstream
(systemd gracefully handles duplicate/nested delegation).
2026-04-23 03:44:36 -04:00
archipelago
732df1b8cb fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes
The libpod HTTP API path (PodmanClient::create_container) ran manifest
memory_limit values like "128Mi" through parse_memory_limit which
lowercased+trim_end_matches("m"), leaving "128i" which parse::<f64>()
rejected. The resulting None became 0 via .unwrap_or(0), and podman
serialised that into the OCI config as memory.limit:0. At container
start time systemd then rejected MemoryMax=0 with "Value specified in
MemoryMax is out of range".

Silently wrong for every manifest in apps/ that uses Kubernetes-style
suffixes (all of them). Became visible on .228 when Step 9 first
exercised the ProdContainerOrchestrator path for bitcoin-ui and lnd-ui
installs \u2014 the old first-boot-containers.sh bash script used podman
run --memory 128m directly, which podman-the-CLI parses correctly, so
the bug never surfaced before.

Two parts:
- parse_memory_limit now recognises Ki/Mi/Gi/Ti (IEC binary, what k8s
  and our manifests use), kB/MB/GB/TB (SI decimal), k/K/m/M/g/G/t/T
  (docker shorthand, treated as IEC binary for backwards compat), and
  bare byte integers. Filters out zero/negative results.
- create_container omits the memory/cpu fields entirely when the
  manifest has no limit or parsing fails, rather than emitting 0. The
  libpod API treats absent as unlimited; 0 is "set MemoryMax=0" which
  systemd rightly rejects. Defence in depth against the next weird
  suffix someone puts in a manifest.

Six regression tests in the new tests module cover IEC, SI, shorthand,
raw bytes, invalid input (empty/garbage/0/negative), and whitespace.
2026-04-23 03:44:23 -04:00
archipelago
a0707f4d48 feat(iso): Step 8a — retire archipelago-reconcile systemd timer
BootReconciler (in-process, 30s interval, spawned from main.rs as of
Step 6 commit 48f08aa3) fully replaces the timer-driven bash
reconciliation path. Delete the systemd unit + timer and their
ISO-builder touchpoints.

Removed:
- image-recipe/configs/archipelago-reconcile.service
- image-recipe/configs/archipelago-reconcile.timer
- image-recipe/build-auto-installer-iso.sh L412-413 (COPY unit+timer)
- image-recipe/build-auto-installer-iso.sh L449 (systemctl enable)
- image-recipe/build-auto-installer-iso.sh L542-543 (cp to WORK_DIR)

Kept (intentionally):
- scripts/reconcile-containers.sh
- scripts/container-specs.sh

Reason: core/archipelago/src/api/rpc/package/update.rs still invokes
reconcile-containers.sh at two sites (OTA update + rollback paths).
Porting those call sites to ContainerOrchestrator::upgrade() requires
manifests for every container update.rs might touch — that scope
belongs in Step 8b. Until then the script stays on disk, just no
longer runs on a periodic timer.

No Rust code changes. cargo check -p archipelago clean, 6 pre-existing
warnings. Skipped full ISO rebuild validation per user decision —
edits are 5 textual deletions with zero behavioral ambiguity; Step 9
live hot-swap on .228 will catch any regression.
2026-04-23 03:04:58 -04:00
archipelago
1c81a739d6 docs: split Step 8 into 8a/8b/8c
Discovered during Step 8 execution that first-boot-containers.sh
creates 30+ containers with per-container logic (wallet loads, DB
init, rpcauth derivations, post-create health waits) and does
substantial non-container setup (secret gen, rootless-podman subuid
chowns, Tor hostnames, WireGuard, firewall, nostr-relay). Only 3 of
the 30+ containers have manifests today (the UIs from Step 7).

Deleting the bash in a single step bricks first-boot on fresh
installs. Split into:

- 8a: delete reconcile-containers.sh + container-specs.sh + reconcile
  systemd unit + timer. BootReconciler fully covers these. Safe,
  atomic, no manifest porting required.
- 8b: port remaining ~25 containers into apps/<id>/manifest.yml. One
  manifest per commit, validated against current bash behavior.
  Multi-day scope.
- 8c: rename first-boot-containers.sh -> first-boot-setup.sh, strip
  container ops, keep secret/dir/Tor/WG/firewall setup. Final
  one-way door, requires 8b complete.
2026-04-23 02:34:43 -04:00
archipelago
6e46932f72 docs: STATUS.md through Step 7 2026-04-23 02:21:01 -04:00
archipelago
069bc4a561 feat(container): bitcoin-ui pre-start hook renders nginx.conf from embedded template
Replaces the first-boot-containers.sh sed/envsubst approach with a
Rust-native render step bound into the ContainerOrchestrator lifecycle.

- New container::bitcoin_ui module: embeds the nginx.conf template via
  include_str!, reads the plaintext RPC password from
  /var/lib/archipelago/secrets/bitcoin-rpc-password, substitutes
  {{BITCOIN_RPC_AUTH}} with base64(archipelago:<password>), and atomic-
  writes (tmp + rename) to /var/lib/archipelago/bitcoin-ui/nginx.conf.
  Idempotent: byte-compares before writing so unchanged input is a
  no-op (no inode churn, no restart cascade).
- ProdContainerOrchestrator gains run_pre_start_hooks(app_id) returning
  HookOutcome::{Rewritten, Unchanged}. Fires in install_fresh before
  create_container, and in ensure_running: on Running + Rewritten
  triggers a restart; on Stopped re-renders then starts.
- bitcoin-ui Dockerfile no longer COPYs a default.conf; the file now
  arrives via runtime bind-mount of the rendered config. If the bind-
  mount is ever missing, nginx starts with no site configured and
  returns 404 everywhere — safe failure vs. serving upstream RPC with
  a stale Authorization header.
- apps/{bitcoin,electrs,lnd}-ui/manifest.yml land as first-class
  manifests. bitcoin-ui declares the bind-mount target and a dependency
  on bitcoin-core; electrs-ui and lnd-ui declare their own deps and
  health checks.
- 8 new unit tests on the render fn (idempotency, rotation, trimming,
  missing/empty secret, template invariants) plus an integration test
  asserting install(bitcoin-ui) actually lands a substituted nginx.conf
  on disk via the hook. 39/39 container:: tests pass
  (test_parse_image_versions pre-existing failure unchanged, out of
  scope).
2026-04-23 02:19:52 -04:00
archipelago
ca734e4ea6 docs: STATUS.md through Step 6 2026-04-22 19:20:17 -04:00
archipelago
48f08aa3e4 feat(container): wire ProdContainerOrchestrator + BootReconciler into main
Step 6 of the rust-orchestrator migration. Construct the container
orchestrator once in main.rs, call load_manifests + adopt_existing
immediately after Config::load, log the adoption report, and spawn
BootReconciler::run_forever with the 30s default interval. Thread the
orchestrator through Server::new -> ApiHandler::new -> RpcHandler::new
so the reconciler and RPC layer share one instance.

Wire a tokio::sync::Notify through the SIGTERM/SIGINT shutdown path so
the reconciler exits cleanly alongside the server drain. Uses notify_one
so the signal stores a permit if the reconciler is mid reconcile_all
when the signal fires.

Delete the commented-out run_boot_reconciliation block in main.rs that
documented the prior bash-script approach being unsafe on unbundled
installs — the new reconciler is manifest-driven and only touches apps
present in /opt/archipelago/apps, fixing that concern.

cargo check -p archipelago clean (6 pre-existing dead-code warnings on
trait methods not yet exercised until Step 9 hot-swap). Container test
suite 43/44 pass; the one failure (container::image_versions::
test_parse_image_versions) is pre-existing and unrelated.
2026-04-22 19:20:13 -04:00
archipelago
fc39b04b4e feat(container): BootReconciler — periodic reconcile loop for prod orchestrator
Step 5 of the rust-orchestrator migration. New file boot_reconciler.rs holds a
small Tokio task that calls ProdContainerOrchestrator::reconcile_all() on a
30-second cadence (answered design Q3).

  * BootReconciler::new(orch, interval, shutdown) — shutdown is an Arc<Notify>
    so callers can trigger a graceful exit without pulling in tokio-util.
  * run_forever(self) — does one reconcile immediately, then loops on
    tokio::select! { sleep_until | shutdown.notified() }. Shutdown interrupts
    the sleep but never an in-flight reconcile_all call.
  * Per-pass outcomes are logged at debug/warn; failures never propagate out
    because reconcile_all already absorbs per-app errors into ReconcileReport.

Four tokio::test(start_paused = true) tests verify the loop cadence against a
CountingRuntime test double:
  * initial_pass_fires_immediately — first reconcile runs with no delay
  * second_pass_fires_after_interval — second pass fires after exactly
    interval elapses in paused-clock time
  * shutdown_terminates_loop — notify_one() lets run_forever return
  * failure_in_one_pass_does_not_stop_loop — the loop keeps ticking even when
    the first pass had to install a missing container

Not wired into main.rs yet — that is Step 6. Re-exported from container::mod
as BootReconciler + RECONCILER_DEFAULT_INTERVAL for the wire-up step.
2026-04-22 19:04:34 -04:00
archipelago
d7692790bc docs: update STATUS.md — Step 4 done, Step 5 next
Records acceptance evidence for Steps 1-4 (container tests 21/21 pass, build
clean with expected unused-method warnings) and queues the BootReconciler
implementation for Step 5.
2026-04-22 18:57:43 -04:00
archipelago
138588422a chore: gitignore macOS AppleDouble files from SSHFS writes
The laptop mounts ~/Projects/archy over SSHFS and macOS finder / Spotlight
sidecars write ._<name> resource-fork files alongside every edit. They are
noise; keep them out of git.
2026-04-22 18:56:58 -04:00
archipelago
e8a59c93c6 feat(container): ContainerOrchestrator trait, RpcHandler uses it in prod
Step 4 of the rust-orchestrator migration. Unifies the container lifecycle
surface behind a single trait so the RPC layer stops caring whether it is
talking to the dev or prod orchestrator.

  * New trait core/archipelago/src/container/traits.rs: ContainerOrchestrator
    with install / start / stop / restart / remove / upgrade / status / list /
    logs / health, all keyed by app_id. Every method is async_trait-based.

  * ProdContainerOrchestrator: the lifecycle methods are moved from inherent
    impl into the trait impl (avoids name-shadowing recursion). Adoption and
    reconcile remain inherent since only main.rs / BootReconciler call them.

  * DevContainerOrchestrator: new trait impl that forwards to the existing
    Dev-named methods, applying the dev container-name + port-offset rules
    internally. New load_manifest_for() helper resolves app_id to
    <data_dir>/apps/<app_id>/manifest.yml so trait-level install(app_id)
    works in dev too. install_container(manifest, path) stays inherent for
    the manifest-path RPC shape.

  * RpcHandler now holds Option<Arc<dyn ContainerOrchestrator>> and, when in
    dev mode, a separate Option<Arc<DevContainerOrchestrator>> for the
    manifest_path install RPC. In prod mode RpcHandler::new() constructs a
    ProdContainerOrchestrator and calls load_manifests() at startup.

  * All seven container-* RPC guards no longer say dev mode required.
    container-install still requires dev mode because its manifest_path
    argument has no prod meaning; every other container RPC now works in both
    modes via the trait.

BOOT STILL DOES NOT USE THIS. main.rs wire-up (Step 6) and BootReconciler
(Step 5) come next. Until then the prod orchestrator is constructed but nothing
populates /opt/archipelago/apps so it has zero manifests to manage, matching
the pre-Step-4 behaviour.

Verification: cargo build -p archipelago clean (11 expected unused method
warnings for methods not yet wired from main.rs). cargo test -p archipelago:
all 21 container::* tests pass (16 prod_orchestrator + 5 others). 24 other
test failures are pre-existing and unrelated (identity_manager / session /
wallet / mesh / credentials — all independently flaky on file-backed state).
2026-04-22 18:56:52 -04:00
archipelago
b6a04d315a feat(container): ProdContainerOrchestrator with build-or-pull, adoption, reconcile
Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC)
implements the full public surface that will replace scripts/first-boot-containers.sh:

  * install / start / stop / restart / remove / upgrade / status / list / logs / health
  * adopt_existing: read-only scan that claims containers matching our manifests by
    name, without recreating — preserves the v1.7.42 fixture on .116.
  * reconcile_all: level-triggered, per-app failures collected rather than aborting.
  * install_fresh: build-or-pull (Step 2 trait methods), relative build contexts
    resolved against the manifest directory.

Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the
archy- prefix; backends keep their bare ID. An explicit extensions.container_name
always wins. Codified in compute_container_name() with unit tests for all three tiers.

Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily,
protecting every mutating op against the reconciler loop. Acquiring the per-app
lock only needs a read lock on the map, so independent apps do not serialize.

16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull,
build-absent, build-present, relative-context), reconcile (noop/exited/missing/
mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health
state mapping, and unknown-app-id rejection. All pass.

Not wired into main.rs yet — that is Step 6. Crate builds clean with expected
unused warnings for the new re-exports.
2026-04-22 18:32:31 -04:00
archipelago
34af4d9d4e feat(container): runtime trait gains image_exists + build_image
Adds two methods to ContainerRuntime so the upcoming ProdContainerOrchestrator
can inspect local image storage and build images from BuildConfig:

- image_exists(image_ref) -> Result<bool>: local-storage check only, does
  not consult registries. Distinguishes exit 0 (present) from exit 1
  (absent) from other failures (environment error).
- build_image(&BuildConfig) -> Result<()>: shells out to podman/docker
  build with -t, -f, deterministically-sorted --build-arg pairs, and the
  context path last.

Implemented on all three runtimes:
- PodmanRuntime: new podman_cli helper shells out alongside the existing
  HTTP API calls (build and image inspect are awkward over the HTTP API)
- DockerRuntime: native docker CLI, same exit-code semantics
- AutoRuntime: delegates to the selected inner runtime

Argv construction extracted into pure build_args_for_podman helper so it
can be unit-tested without a real podman. 4 new tests cover minimal args,
custom Dockerfile path, deterministic build-arg sorting (guards against
HashMap iteration non-determinism), and context-is-last (positional arg
placement is load-bearing for podman build).

Step 2 of docs/rust-orchestrator-migration.md. 25/25 tests pass.
2026-04-22 17:46:47 -04:00
archipelago
3767c2670c feat(container): add build source to manifest schema
ContainerConfig.image is now Option<String>, mutually exclusive with a new
optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image
or build must be present, enforced in AppManifest::validate.

Adds ResolvedSource enum (Pull | Build) and ContainerConfig::resolve +
::image_ref helpers so the orchestrator can treat pull and build uniformly.
All 26 existing pull-only manifests continue to parse unchanged
(covered by existing_pull_only_manifests_still_parse test).

Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator.
Dev orchestrator errors out cleanly on Build sources until Step 2 lands
build_image support on the runtime trait.

Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass.

Also includes: docs/rust-orchestrator-migration.md (design spec) and
docs/STATUS.md resume section for the next session.
2026-04-22 17:46:36 -04:00
archipelago
7ecd30bde2 release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red
Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.

What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
  to 3 times with 500ms and 1500ms backoffs between attempts. Only
  transient transport errors (timeout, connect refused, send/recv IO)
  trigger retry. A well-formed bitcoind error response is surfaced
  immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
  of reqwest's own timeout, so DNS starvation or TLS wedging can't
  steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
  so a dead bitcoind is fast-failed inside the first attempt instead
  of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
  down timeouts to ~100ms per attempt. Production defaults live in
  RetryConfig::production().

Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
  same 10s-timeout pattern. Not migrated to the new wrapper in this
  release; scheduled for v1.7.43 alongside the render_bitcoin_conf
  work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
  different failure profiles - audit first, migrate only the ones
  that actually exhibit the bug.

Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
  - happy_path_first_attempt: no retry when first attempt succeeds
  - retries_on_timeout_then_succeeds: first attempt times out, second
    succeeds, returns OK (uses a short-timeout RetryConfig so the test
    runs in <1s instead of 15s)
  - retries_exhausted_on_persistent_connect_refused: all attempts fail
    against a closed port, error bubbles up, elapsed time confirms
    backoffs actually ran
  - does_not_retry_on_rpc_level_error: bitcoind-returned error body is
    surfaced immediately, no retry
  - does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
    body) is NOT retried - guards against the tempting "retry all
    non-2xx" mistake that would mask real bitcoind misconfig
  - retry_budget_invariants: asserts total wall-time ceiling stays
    under 60s so a bumped constant can't silently hang a UI call
    forever

Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
v1.7.42-alpha
2026-04-22 16:46:28 -04:00
archipelago
048679065e release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet
Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 +
v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500)
with no recovery path short of SSH. This release adds a self-check
guardrail to the update flow.

What changed:
- apply_update() writes a pending-verify marker with old+new version and
  a 150s deadline immediately before scheduling the service restart.
- verify_pending_update() runs from main.rs startup. If the marker is
  present and within its freshness window, the new binary waits 15s for
  nginx + backend to settle, then probes https://127.0.0.1/ every 5s for
  up to 90s (self-signed certs accepted).
- On any probe success within the window, the marker is cleared and
  nothing else happens.
- On window-exhaust, the new binary:
    1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts>
       (quarantined, not deleted, so we can post-mortem).
    2. Restores web-ui.bak on top of web-ui.
    3. Calls rollback_update() to restore the previous binary.
    4. Updates state.current_version to reflect the rollback.
    5. systemctl --no-block restart archipelago so the OLD binary boots.
- Markers older than 10 minutes are treated as stale and cleared without
  probing, so a crashed-during-startup marker from weeks ago cannot
  spontaneously roll back a healthy node on a later reboot.
- rollback_update() binary copy now goes through host_sudo instead of
  tokio::fs::copy, so it escapes the service's ProtectSystem=strict
  mount namespace. Without this, the rollback silently failed with
  EROFS on /usr/local/bin and orphaned the rollback - the exact
  opposite of what auto-rollback is for.

Tests: 4 new unit tests in update::tests covering marker round-trip,
absent-marker noop, no-panic on verify_pending_update with nothing to
verify, and an invariant assert that the 90s probe window stays below
the 600s stale threshold. All passing.

Side fix: scripts/create-release-manifest.sh was dying with exit 141
(SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail.
Replaced with a single awk NR==1 that doesn't short-circuit the upstream
pipe, so the release-build flow is idempotent again.
v1.7.41-alpha
2026-04-22 16:14:35 -04:00
Dorian
50744952b7 release(v1.7.40-alpha): fix tarball root perms at source so OTA can't 500 again
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Failing after 11m1s
v1.7.38 and v1.7.39 both shipped with `./` inside the frontend tarball marked
drwx------ (700). Tar extraction preserves archive perms, so every node that
pulled the OTA landed with /opt/archipelago/web-ui at 700, nginx (www-data)
returned 500 "permission denied" on every page, and the browser showed
"Internal Server Error nginx". .116 hit this on both v1.7.38 and v1.7.39
rollouts. The v1.7.39 runtime self-heal in main.rs was the wrong layer —
systemd's ReadOnlyPaths namespace made /opt/archipelago read-only from inside
the archipelago service, so chmod from there returned EROFS.

Root cause: create-release-manifest.sh used mktemp -d (700 default umask) for
staging, then tar preserved that 700 in the archive's root entry.

Fix the archive itself:
- chmod 755 staging dir + `find -type d -exec chmod 755` + `-type f chmod 644`
  before tar, so the on-disk entries are correct.
- tar --owner=0 --group=0 --mode='u=rwX,go=rX' to normalize archive perms
  belt-and-braces in case file-mode drift ever reappears.
- Post-tar verify: `tar tvzf | head -1` must show drwxr-xr-x at root, or
  the release script aborts before the manifest is even generated.

Binary unchanged semantically — the main.rs self-heal stays in as a last-
resort belt (can't hurt on nodes whose FS isn't namespace-isolated), and the
update.rs in-extractor chmod stays in so v1.7.40-onwards extractors are
double-safe. The authoritative fix is the archive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.40-alpha
2026-04-22 13:54:44 -04:00
Dorian
3218f71703 release(v1.7.39-alpha): hotfix web-ui perms after OTA (nginx 500) + startup self-heal
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 10m28s
v1.7.38 shipped with an OTA bug: the tar-extracted staging dir inherited 700
perms and nginx (www-data) returned 500/403 on every request after the swap.
.116 hit this on rollout; had to chmod by hand to recover.

- update.rs: after extraction, explicitly chmod 755 dirs + 644 files on the
  new staging dir before the mv into place, so nginx can stat/serve them.
- main.rs: self-heal on startup — if /opt/archipelago/web-ui is not
  world-readable, run `sudo chmod -R u=rwX,go=rX` to repair. This is what
  rescues nodes upgrading from v1.7.37/v1.7.38, since their extractor
  (running on the old binary) doesn't have the chmod fix yet — the new
  binary's first boot fixes the mess before nginx serves a single request.

Everything v1.7.38 shipped is still in this release:
- auth.rs auto-heals is_onboarding_complete() from setup_complete +
  password_hash so nodes don't bounce back to /onboarding/intro after
  browser clear / reboot / update
- useOnboarding tri-state: backend-unreachable no longer defaults to intro
- login sounds gated by isFirstInstallPhase() — silent after onboarding,
  typing sounds unaffected
- FIPS app / Nostr Relay / Nostr VPN / Routstr / Penpot removed from
  catalog + frontend + Rust + docker + icons; 15 image versions deleted
  from tx1138, .168, gitea-local
- AIUI baked into release tarball via demo/aiui/
- prebuild hook syncs app-catalog/catalog.json → public/catalog.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.39-alpha
2026-04-22 13:26:54 -04:00
Dorian
ca5d2cc42a release(v1.7.38-alpha): onboarding auto-heal + silent returning logins + app-store trim
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Failing after 11m12s
- auth.rs now infers onboarding-complete from setup_complete + password_hash so
  nodes stop bouncing users through the intro wizard after browser clear / update
  / reboot; the flag self-heals to disk on next check
- frontend: "backend uncertain" no longer defaults to /onboarding/intro —
  useOnboarding returns null + callers poll / retry instead of flashing the wizard
- login sounds (synthwave, welcome voice, pop, whoosh, oomph) gated by
  isFirstInstallPhase(); typing sounds unaffected
- removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog,
  frontend config, Rust AppMetadata + install dispatch + install_penpot_stack;
  docker/fips-ui + docker/nostr-vpn-ui + apps/penpot dirs and 5 icons deleted;
  15 image versions deleted from tx1138, .168, gitea-local registries (.160
  Gitea was 502 at release time — follow-up)
- AIUI baked into frontend release tarball via demo/aiui/; deploy-to-target
  falls back to demo/aiui/ when the AIUI sibling checkout is missing
- prebuild hook syncs app-catalog/catalog.json → public/catalog.json so the
  two copies can no longer drift (was the source of the "apps still visible"
  bug — public/ had stale data)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.38-alpha
2026-04-22 13:02:24 -04:00
Dorian
9cb114c50a release(v1.7.37-alpha): bitcoin-core install fixes + dynamic node UI + full-archive default
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 14m26s
Install flow
- api/rpc/package/install.rs: always append the literal image URL as a
  last-resort pull candidate in do_pull_image, so images not carried by
  any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install
  instead of masquerading as a generic pull failure across every mirror.
- api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat
  error, not just "file exists". Once bitcoin-knots' first-boot chowns
  /var/lib/archipelago/bitcoin into the container's user namespace (700
  perms, UID 100100/100101), the archipelago daemon can't even traverse
  in — try_exists returns Err which unwrap_or(false) treated as "not
  present" and drove a doomed write. Now errors out of the directory
  traversal are treated as "conf already owned by container user" and
  the write is skipped. Mirrors the lnd.conf pattern.
- api/rpc/package/install.rs: drop the hardcoded `prune=550` from the
  conf default. Operators with multi-TB drives shouldn't be silently
  pruned; users who want a pruned node can set it in bitcoin.conf
  themselves. Full archive is the only honest default.
- api/rpc/package/config.rs: bitcoin-core now passes explicit
  -server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI
  args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and
  reads conf + argv only; without these the RPC listens on 127.0.0.1
  inside the container and rootlessport can't reach it, so the
  bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call.
  Bitcoin Knots keeps its own entrypoint-driven defaults.
- container/docker_packages.rs: split bitcoin-core out of the shared
  AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with
  bitcoin-core.svg and a Reference-implementation description; the
  bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home
  card showing "Bitcoin Knots" for a Core install.

Bitcoin node UI (docker/bitcoin-ui)
- index.html: impl name/tagline/logo now dynamic. applyImplBranding()
  reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves
  to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both
  get their own icon and subtitle. Settings modal replaced its
  hardcoded Regtest/txindex=1/port-18443 placeholders with live values
  from getblockchaininfo + getindexinfo + getzmqnotifications.
- index.html: new Storage info card (Full Archive · X GB /
  Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on
  the main dashboard, same level as Network. Settings modal mirrors it
  with the prune height when applicable.
- Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the
  bg-network.jpg used by the dashboard are now COPY'd into the image
  under /usr/share/nginx/html/assets. Previously the <img src> pointed
  at paths that 404'd into the SPA fallback and the onerror handler
  hid the broken logo silently.

Frontend
- appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334),
  HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core).
  Without these the AppSessionFrame showed "No URL found for
  bitcoin-core" and the home/app-list title fell through to the raw id.
- settings/AccountInfoSection.vue: backfill What's New entries for
  v1.7.31 through v1.7.37 that had been missed in earlier cuts.

Release plumbing
- releases/v1.7.37-alpha/: binary + frontend tarball.
- releases/manifest.json: v1.7.37-alpha, sha256/size refreshed.
- Cargo.toml / package.json: version bumps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.37-alpha
2026-04-22 11:03:47 -04:00
Dorian
1f912a0f58 fix(catalog): prefix bitcoin-core image with docker.io/ so the install validator accepts it
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Failing after 31m7s
The trusted-registry allowlist in api/rpc/package/config.rs splits the
image on '/' and matches the first segment against a fixed set (docker.io,
ghcr.io, git.tx1138.com, 23.182.128.160:3000, ghcr.io, localhost). A bare
'bitcoin/bitcoin:28.4' splits to registry="bitcoin" which isn't on the
list, so the install RPC was returning 'Invalid Docker image format'.

Live catalogs on .160 and gitea-local already hotfixed directly; these
static copies keep ISO builds and the final hardcoded fallback in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:18:49 -04:00
Dorian
7106a81c6a release(v1.7.36-alpha): bitcoin-core in App Store + Sovereignty Stack + dynamic catalog URL
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Has been cancelled
- neode-ui/public/assets/img/app-icons/bitcoin-core.svg (NEW): 256×256
  Umbrel community Bitcoin icon sourced from getumbrel.github.io/
  umbrel-apps-gallery/bitcoin/icon.svg. Referenced by the static
  catalog, the curated fallback, and the upstream lfg2025/app-catalog
  entry so every surface shows the same image.
- app-catalog/catalog.json + neode-ui/public/catalog.json: add
  bitcoin-core (v28.4) entry pointing at bitcoin/bitcoin:28.4. Same
  entry pushed to the lfg2025/app-catalog repo on .160 and the local
  gitea mirror so nodes see it without needing a full archipelago
  update. Sovereignty Stack entry added to FEATURED_DEFINITIONS with
  a description that frames it as a Knots alternative, not a rival.
- core/archipelago/src/api/handler/mod.rs: handle_app_catalog_proxy
  is now instance-scoped (&self) and derives its upstream list from
  load_registries — each active container registry contributes one
  `<scheme>://<reg.url>/app-catalog/raw/branch/main/catalog.json` URL
  in priority order (scheme follows tls_verify). When the operator
  switches mirrors in Settings, the App Store now follows. Falls back
  to the legacy hardcoded .160/tx1138 pair only when registry config
  can't be loaded, so the App Store still renders on nodes that
  haven't persisted one yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.36-alpha
2026-04-22 09:06:10 -04:00
Dorian
987158ef5f release(v1.7.35-alpha): rootless-netns self-heal + app update button + bitcoin-core 28.4 + Node DID unification
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Has been cancelled
- core/archipelago/src/bootstrap.rs (NEW): embed scripts/container-doctor.sh
  and image-recipe/configs/archipelago-doctor.{service,timer} via
  include_str! and sync to disk + enable the timer on every archipelago
  startup. Idempotent (content-hash compare), dev-box symlink guard keeps
  the git checkout untouched, best-effort (warn-only on failure) so
  bootstrap never blocks server readiness. Wired in main.rs as a
  background tokio task.
- scripts/container-doctor.sh: add fix_rootless_netns_egress(). Detects
  when the rootless-netns has lost its pasta tap (container-to-container
  still works but outbound DNS/TCP fails) via an nsenter probe into
  aardvark-dns; with a two-probe 10s debounce to rule out transients and
  a host-precheck that bails out if the host itself is offline. When the
  rootless-netns is truly broken, does a graceful podman stop --all /
  start --all so pasta + aardvark-dns rebuild the netns from scratch.
  Bitcoin-knots and every other outbound container recover in one cycle.
- core/archipelago/src/update.rs: host_sudo → pub(crate) so bootstrap.rs
  can reuse the existing systemd-run escape hatch.
- apps/bitcoin-core/manifest.yml: bump app version 24.0.0 → 28.4.0 and
  image bitcoin/bitcoin:24.0 → bitcoin/bitcoin:28.4. Resources aligned
  with the real container-specs.sh large-disk tune (4 GiB memory cap,
  cpu_limit: 0 so bitcoind can run -par=auto across every core).
- neode-ui/src/views/apps/AppCard.vue + Apps.vue: add an Update button
  + Updating spinner to every app card that has available-update set.
  Wires through serverStore.updatePackage(id) — the same RPC the detail
  view already calls. common.update / common.updating i18n keys added in
  en.json and es.json.
- core/archipelago/src/identity_manager.rs: add create_from_signing_key()
  that mirrors an existing Ed25519 key as a manager-level identity with
  a deterministic id (`node-<pubkey16>`). Idempotent across restarts,
  gets the hex-SVG master avatar.
- core/archipelago/src/server.rs: the auto-create path on first boot now
  mirrors the node's own signing_key (seed-derived on onboarded installs)
  as a "Node" identity instead of generating a random "Default" keypair.
  Once this ships, the DID on the Web5 DID Status card (via node.did
  RPC), the Node entry on the Identities page (via identity.list), and
  the DID used for peer-to-peer connects (via server_info.pubkey) all
  resolve to the same seed-derived pubkey.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.35-alpha
2026-04-22 08:29:56 -04:00
Dorian
5f6b4232d2 release(v1.7.34-alpha): re-seed onboarding cache + rotating login bg + drop re-login zoom
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 22m52s
- useOnboarding.ts: when the backend gives a definitive answer
  (true/false, not a null retry failure), re-seed the
  neode_onboarding_complete localStorage flag accordingly. Fixes the
  case where a user clears site data on an already-onboarded node —
  OnboardingWrapper's useVideoBackground computed reads localStorage
  synchronously, so without this re-seed the intro video would fire
  again on /login even though RootRedirect correctly sent them
  straight to /login.
- OnboardingWrapper.vue: login background now rotates through
  bg-intro-1..6 on each /login mount, with the current index
  persisted to localStorage (neode_login_bg_idx) so subsequent
  logouts advance rather than repeat the same image.
- Dashboard.vue: subsequent-login branch drops the 1.2s showZoomIn
  entirely. Only the first dashboard entry after onboarding plays
  the full zoom + glitch reveal; every re-login now just fades in
  with the welcome typing (~300ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.34-alpha
2026-04-22 05:42:52 -04:00
Dorian
65582d67c6 release(v1.7.33-alpha): onboarding/login UX fixes + PWA cache bust
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 12m50s
- useOnboarding.ts: prefer the backend over localStorage when checking
  onboarding completion. The old order (localStorage first) meant any
  browser that had ever onboarded a node would treat every new fresh
  node as already-onboarded and skip the wizard, dumping the user
  straight at the inline set-password form. Backend is now authoritative;
  localStorage stays as the offline fallback.
- OnboardingWrapper.vue: skip the intro video on `/login` once
  `neode_onboarding_complete` is set. Returning logged-out users now
  get the static lock-screen background + glitch overlay instead of
  replaying the full intro on every logout.
- RootRedirect.vue: when the health check fails, only show the full
  BootScreen if the node was never onboarded. For already-onboarded
  nodes (i.e. an OTA-update blip), keep the spinner and poll the
  health endpoint every 2s for up to 60s before falling back to the
  boot screen. Fixes the "fake boot loader" / "server starting up"
  screens flashing on every successful update.
- loginTransition store: new `justCompletedOnboarding` flag distinct
  from `justLoggedIn`. Set true only by the inline setup-password
  flow (handleSetup). Dashboard.vue branches on it: full glitch+zoom
  reveal for the post-onboarding entry, quick zoom + welcome typing
  on every other login (no triple glitch flashes, ~1.2s vs 8s).
- vite.config.ts: bump assets cache from `assets-cache-v2` to
  `assets-cache-v3` so service workers running the previous bundle
  invalidate their cache and pick up the new UI cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.33-alpha
2026-04-22 04:45:33 -04:00
Dorian
fd3f5d2701 release(v1.7.32-alpha): fix frontend tarball layout + mDNS shutdown hang
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 12m32s
- HOTFIX: v1.7.31-alpha's frontend tarball was packaged with a
  `neode-ui/` top-level directory instead of the flat layout v1.7.30
  and earlier used. Nodes that applied v1.7.31 ended up with
  `/opt/archipelago/web-ui/neode-ui/index.html` instead of
  `/opt/archipelago/web-ui/index.html`, and nginx returned 403/500.
  v1.7.32's tarball is built with `tar -C web/dist/neode-ui .` so
  files land directly at web-ui root. Broken nodes auto-heal on this
  update (web-ui dir is replaced).
- transport/lan.rs: add Drop impl that calls ServiceDaemon::shutdown()
  on the mdns_sd daemon. Without this the OS thread it spawns, plus
  the blocking `receiver.recv()` task, keep the tokio runtime alive
  past SIGTERM — long enough for systemd's TimeoutStopSec to SIGKILL
  the service and mark it Failed. Was visible on every update:
  "shut down cleanly" logged, then 15s later systemd forcibly kills.
- main.rs: after logging "Archipelago shut down cleanly", call
  `std::process::exit(0)` explicitly. Belt-and-suspenders against
  any future non-daemon thread creeping in (reqwest resolver pool,
  etc.) and causing the same SIGKILL regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.32-alpha
2026-04-22 03:52:22 -04:00
Dorian
fdaa5646b2 release(v1.7.31-alpha): idempotent IndeedHub install + auto-merge default mirrors/registries + 3rd OVH update mirror
All checks were successful
Build Archipelago ISO (dev) / build-iso (push) Successful in 24m1s
- Backend: install.rs registry reachability probe now strips the
  `host[:port]/namespace` suffix before appending `/v2/` (the Docker
  V2 API lives at the host root, not under the namespace) and accepts
  HTTP 405 in addition to 200/401 as "registry daemon alive". This
  fixes false "unreachable" reports on the Test button for Gitea and
  other registries that protect their /v2/ endpoint.
- Backend: stacks.rs install_indeedhub_stack now force-removes any
  leftover indeedhub-* containers and indeedhub-net before creating
  the stack. A partial install (or the old first-boot stub racing the
  installer) used to leave containers around that blocked re-install
  with "name already in use". Re-running the App Store install now
  self-heals.
- Backend: registry.rs load_registries auto-merges any default
  registry URLs missing from the saved config (appended with priority
  max+10+i, persisted). Lets new default mirrors (e.g. Server 3 OVH)
  roll out to existing nodes without manual config edits. Explicit
  removals still stick — URLs absent from disk AND absent from
  defaults stay gone.
- Backend: update.rs adds DEFAULT_TERTIARY_MIRROR_URL at
  http://146.59.87.168:3000/ (Server 3 OVH) to default_mirrors, with
  the same auto-merge-on-load behavior as registries. Test updated
  for 3-mirror default (.160, tx1138, .168).
- Scripts: dropped the first-boot IndeedHub stub (~38 lines in
  first-boot-containers.sh §8b). It predated the proper stack
  installer, raced it, and was the main source of the name-conflict
  mess the stacks.rs cleanup above now also guards against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.31-alpha
2026-04-22 03:26:09 -04:00
Dorian
f9b44f5e2e release(v1.7.30-alpha): live install/uninstall progress + cleaner pull waterfall
Some checks failed
Build Archipelago ISO (dev) / build-iso (push) Failing after 11m25s
- Backend: unified pull-progress streaming across primary AND fallback
  registries. Earlier code only streamed for the primary attempt; if it
  failed fast (VPS 404, etc.) the UI froze at 0% until the fallback
  finished. The waterfall now uses a single shared helper that streams
  podman stderr through update_install_progress for every URL tried.
- Backend: PackageDataEntry gains uninstall_stage, set at each phase of
  handle_package_uninstall ("Stopping containers (i/total)",
  "Cleaning up volumes", "Removing app data"). State flips to Removing
  during the pipeline.
- Frontend: MarketplaceAppCard renders the live progress bar with byte
  counts during installs, matching the System Update download bar style.
- Frontend: AppCard renders the live uninstall stage label per app.
  Modal closes immediately on confirm so concurrent uninstalls each
  show their own progress on their own card.
- Cleanup: removed dead helpers (image_candidates, rewrite_for_primary,
  primary_image_url, pull_from_registries_with_skip) made unused by
  the install.rs refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1.7.30-alpha
2026-04-21 19:11:36 -04:00