fix: overhaul container lifecycle — recovery, health, uninstall, UI state

Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian
2026-03-31 07:03:57 +01:00
parent 795e74bc50
commit 1e283daf13
65 changed files with 3950 additions and 298 deletions

View File

@@ -110,6 +110,9 @@ rpcallowip=0.0.0.0/0
rpcport=8332
listen=1
printtoconsole=1
# ZMQ publishers for LND and other services that need real-time block/tx notifications
zmqpubrawblock=tcp://0.0.0.0:28332
zmqpubrawtx=tcp://0.0.0.0:28333
BTCCONF
log "Generated bitcoin.conf with rpcauth (no plaintext credentials)"
fi
@@ -305,7 +308,7 @@ if ! $DOCKER ps --format '{{.Names}}' 2>/dev/null | grep -qE 'bitcoin-knots|arch
--memory=$(mem_limit bitcoin-knots) --network archy-net \
--cap-drop ALL --cap-add CHOWN --cap-add FOWNER --cap-add SETUID --cap-add SETGID --cap-add DAC_OVERRIDE \
--security-opt no-new-privileges:true \
-p 8332:8332 -p 8333:8333 \
-p 8332:8332 -p 8333:8333 -p 28332:28332 -p 28333:28333 \
-v /var/lib/archipelago/bitcoin:/home/bitcoin/.bitcoin \
"${BITCOIN_KNOTS_IMAGE}" \
-server=1 $BTC_EXTRA_ARGS \
@@ -832,7 +835,7 @@ if ! $DOCKER ps --format '{{.Names}}' 2>/dev/null | grep -q indeedhub; then
--memory=$(mem_limit indeedhub) \
--cap-drop ALL --security-opt no-new-privileges:true \
--read-only --tmpfs /tmp:rw,noexec,nosuid,size=64m --tmpfs /app/.next/cache:rw,noexec,nosuid,size=128m \
-p 7777:7777 \
-p 8190:3000 \
-e NODE_ENV=production -e NEXT_TELEMETRY_DISABLED=1 \
"$INDEEDHUB_IMAGE" 2>>"$LOG" || true
# Fix IndeedHub for iframe: remove X-Frame-Options so it loads in Archipelago panel