fix: deploy credential sync, health checks, rootless port binding
- LND config always synced with secrets/bitcoin-rpc-password before starting (both deploy scripts) — fixes 401 auth errors on all nodes - Replace eval "$DB_PASSWORDS" with safe individual SSH reads in deploy-tailscale.sh (eliminates command injection risk) - Add MariaDB password sync step after container start (ALTER USER) - Add --health-cmd to all 25 containers in deploy-tailscale.sh - FileBrowser uses --user 0:0 for rootless port 80 binding (both scripts) - Fedimint env var fixed: FM_REL_NOTES_ACK=0_4_xyz Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -31,6 +31,9 @@
|
||||
## Infrastructure
|
||||
- [project_bitcoin_rpc_auth.md](project_bitcoin_rpc_auth.md) — Bitcoin rpcauth, system Tor, reboot survival, container resilience
|
||||
|
||||
## Deploy & Container Fixes
|
||||
- [project_deploy_session_2026_03_22.md](project_deploy_session_2026_03_22.md) — Fleet deploy fixes: credential mismatches, restart storms, rootless port 80, deploy script hardening
|
||||
|
||||
## Completed Work
|
||||
- [project_mesh_198_issue.md](project_mesh_198_issue.md) — Mesh .198: 3 bugs fixed and deployed
|
||||
- [project_indeedhub_arch3_fix.md](project_indeedhub_arch3_fix.md) — IndeedHub Arch 3: corrupted combined tarball fixed
|
||||
|
||||
64
.claude/memory/project_deploy_session_2026_03_22.md
Normal file
64
.claude/memory/project_deploy_session_2026_03_22.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
name: Deploy session 2026-03-22 findings
|
||||
description: Comprehensive deploy/build fixes made overnight — container issues, image tags, script improvements, remaining work
|
||||
type: project
|
||||
---
|
||||
|
||||
## Session Summary (2026-03-22 overnight)
|
||||
|
||||
Massive deploy infrastructure overhaul across all 5 nodes (.228, .198, Arch 1/2/3).
|
||||
|
||||
### Fixed in deploy-tailscale.sh
|
||||
- **Image tags**: Bitcoin Knots `28.1` (not `v28.1`), BTCPay `1.13.7` (not `1.14.5`), SearXNG `2026.3.20-6c7e9c197`
|
||||
- **Removed Immich** (3 containers) and **Penpot** (5 containers) from deploy + build
|
||||
- **Fedimint**: `FM_REL_NOTES_ACK=0_4_xyz` env var (NOT `FM_SKIP_REL_NOTES_ACK` or `FM_REQ_RELEASE_NOTES_ACK_V0_4`)
|
||||
- **Fedimint-gateway**: `--password` instead of `--bcrypt-password-hash` (v0.5.1 CLI change)
|
||||
- **FileBrowser**: added `--cap-add NET_BIND_SERVICE` for port 80 binding
|
||||
- **SearXNG**: added `/var/lib/archipelago/searxng:/etc/searxng` volume mount + caps
|
||||
- **Postgres**: pinned to `postgres:15` (data initialized with 15, incompatible with 16)
|
||||
- **Migration**: one-time flag file `/var/lib/archipelago/.rootless-migrated`
|
||||
- **Recreate-if-broken pattern**: containers that exist but are stopped get deleted and recreated
|
||||
- **Arch 2 hostname**: fixed from hardcoded hostname to `$TAILSCALE_ARCH2`
|
||||
- **Custom UI images**: graceful skip if not available, source extracted to repo (`docker/bitcoin-ui/`, `docker/electrs-ui/`)
|
||||
- **AIUI tar xattr**: silenced with `--no-xattrs` (only in deploy-tailscale.sh, NOT deploy-to-target.sh yet)
|
||||
- **Nginx MIME warning**: removed `text/html` from `sub_filter_types`
|
||||
|
||||
### Added
|
||||
- `--fleet` flag in deploy-to-target.sh: deploys .228 → .198 → Arch 1/2/3
|
||||
- `--both` lock fix: releases lock before recursive `--live` call
|
||||
- Container verification step (Step 26b): restarts exited containers, fixes permissions, checks Tor
|
||||
- IndeedHub backend stack rebuilt on .228 (7 containers)
|
||||
- IndeedHub nginx patched with direct IPs (podman DNS doesn't work with nginx resolver)
|
||||
|
||||
### Frontend changes
|
||||
- Replaced Immich with FileBrowser on Setup homescreen (`goals.ts`, `EasyHome.vue`)
|
||||
- `MEMPOOL_API_IMAGE` renamed to `MEMPOOL_BACKEND_IMAGE` in image-versions.sh
|
||||
- Nextcloud downgraded from 30 to 29 (one major version upgrade at a time)
|
||||
|
||||
### Session 2 fixes (same day)
|
||||
|
||||
**Critical pattern found: Container credential mismatches**
|
||||
- Deploy generates random passwords stored in `secrets/`. MariaDB/Postgres only use env vars on FIRST init — subsequent restarts ignore them. Container recreation with new passwords → auth failures → crash loops.
|
||||
- 50,000+ cumulative container restarts across fleet from this single root cause.
|
||||
|
||||
**Fixes applied to all nodes:**
|
||||
1. LND: `lnd.conf` rpcpass synced from `secrets/bitcoin-rpc-password` (was hardcoded `archipelago123`)
|
||||
2. MariaDB mempool: data dirs wiped + reinitialized (password mismatch unrecoverable)
|
||||
3. BTCPay Postgres: `ALTER USER` to sync password with secrets
|
||||
4. FileBrowser: `--user 0:0` instead of `--cap-add NET_BIND_SERVICE` (rootless port 80 fix)
|
||||
5. Nextcloud: same `--user 0:0` fix
|
||||
6. Tailscale container on .228: removed (2,685 restarts — unauthenticated, host already has TS)
|
||||
|
||||
**Deploy script fixes:**
|
||||
- `deploy-tailscale.sh`: LND config always synced before start, `eval "$DB_PASSWORDS"` → safe individual reads, MariaDB password sync step, filebrowser `--user 0:0`
|
||||
- `deploy-to-target.sh`: LND stale config check now compares passwords (not just cookie/localhost), filebrowser `--user 0:0`
|
||||
|
||||
**Rootless port 80 rule**: Containers binding port 80 MUST use `--user 0:0`. `NET_BIND_SERVICE` cap doesn't work in rootless (UID 0 → host 100000, unprivileged).
|
||||
|
||||
### Remaining issues for next session
|
||||
- **Vaultwarden exit 101** on Arch 2 — likely corrupted SQLite DB
|
||||
- **PhotoPrism storage permission** on Arch 1 — file creation fails despite correct ownership
|
||||
- **Arch 3 resource contention** — 7.3GB RAM, load 14, 28 containers. May need to reduce container count.
|
||||
- **Health checks missing** on most containers (only filebrowser/jellyfin have them)
|
||||
- **Tar xattr spam** in deploy-to-target.sh (fixed in deploy-tailscale.sh only)
|
||||
- **IndeedHub nginx IPs are ephemeral** — need re-patch after container restart
|
||||
Reference in New Issue
Block a user