feat: bitcoin-ui CSS fix, HTTPS proxy support, deploy script improvements

Bitcoin UI:
- Replace cdn.tailwindcss.com with locally bundled tailwind.css (CSP blocks external scripts)
- Make all asset paths relative for nginx proxy compatibility
- Add bitcoin-ui build/deploy to deploy-to-target.sh (was missing entirely)
- Use --network host (bitcoin-ui proxies Bitcoin RPC at 127.0.0.1:8332)

HTTPS mixed content fix:
- Add HTTPS_PROXY_PATHS in AppSession.vue — when parent page is HTTPS,
  iframe loads through nginx proxy instead of direct HTTP port
- Prevents browser blocking HTTP iframes inside HTTPS pages
- All Tailscale servers use HTTPS, this was breaking all app iframes

Deploy & first-boot improvements:
- first-boot-containers.sh auto-detects disk size for pruning vs txindex
- first-boot-containers.sh checks fallback source path for UI containers
- Added mempool-electrs to APP_PORTS mapping
- ElectrumX container creation in first-boot
- Podman doctor/fix/uptime skills added

Also includes: session persistence, identity management, LND transactions,
ElectrumX status UI, nostr-provider improvements, Web5 enhancements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian
2026-03-16 12:58:35 +00:00
parent 4e54b8bd4d
commit 367b483a72
49 changed files with 6180 additions and 495 deletions

View File

@@ -0,0 +1,156 @@
---
name: podman-doctor
description: >
Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
port mappings, network connectivity, health status, restart policies, and config consistency
across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
Use when asked to "diagnose containers", "check podman", "why is app not working",
"container health check", "port not reachable", "audit containers", "podman status",
or when any container/app is misbehaving.
allowed-tools: Bash Read Glob Grep
---
# Podman Doctor — Container Infrastructure Diagnostics
Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers.
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
## Workflow
### Step 1: Gather Runtime State
Run these on the server:
```bash
# All containers with status, ports, networks
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
# Check for port conflicts on known ports
sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
```
### Step 2: Check Restart Policies
Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
```bash
for c in $(sudo podman ps -a --format "{{.Names}}"); do
echo -n "$c: "
sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
done
```
**Red flag**: `no` or empty = container won't survive reboot.
### Step 3: Verify Port Mapping Consistency
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"`
**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
- `image-recipe/configs/nginx-archipelago.conf` (HTTP)
- `image-recipe/configs/snippets/archipelago-https-app-proxies.conf` (HTTPS)
**Layer 4 — Frontend Routing**: Read `neode-ui/src/stores/appLauncher.ts``PORT_TO_APP_ID` map.
| Symptom | Root Cause |
|---------|-----------|
| App iframe shows 502/504 | Nginx proxies to wrong port, or container not running |
| App loads wrong content | Port collision — two containers on same host port |
| Works on port but not /app/ path | Missing nginx location block |
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
### Step 4: Network Connectivity Audit
```bash
# Networks and their containers
sudo podman network ls
sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
```
**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
### Step 5: Health Check Status
```bash
# Containers with health checks — are they passing?
for c in $(sudo podman ps --format "{{.Names}}"); do
health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
echo "$c: $health"
fi
done
# Containers WITHOUT health checks (gap in monitoring)
for c in $(sudo podman ps --format "{{.Names}}"); do
hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
echo "NO HEALTHCHECK: $c"
fi
done
```
### Step 6: Resource & Failure Analysis
```bash
# Resource usage
sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Recent deaths (last 24h)
sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20
# OOM kills
sudo podman ps -a --format "{{.Names}}" | while read c; do
oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
[ "$oom" = "true" ] && echo "OOM KILLED: $c"
done
# Non-zero exits
sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
```
### Step 7: Systemd Integration
```bash
systemctl is-active archipelago nginx
systemctl list-units --type=service | grep -i podman
systemctl list-timers --all | grep -i -E "podman|container|archipelago"
```
### Step 8: Generate Report
Produce a structured report:
```
## Container Diagnostic Report
### Summary
- Total containers: X running, Y stopped, Z unhealthy
- Port conflicts: [list or "none"]
- Missing restart policies: [list or "none"]
- Network issues: [list or "none"]
- Health check gaps: [list]
### Critical Issues (fix immediately)
1. ...
### Warnings (fix soon)
1. ...
### Recommended Actions
1. ...
```
After diagnosis, suggest running `/podman-fix` for any issues found.
## Port Reference
See `references/port-map.md` for the canonical port assignment table across all 4 layers.

View File

@@ -0,0 +1,55 @@
# Common Podman Failure Patterns
## Container Won't Start
| Error | Cause | Fix |
|-------|-------|-----|
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs |
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
| `no such network` | Network missing | `podman network create archy-net` |
## Container Starts But App Unreachable
| Symptom | Check Layer | Fix |
|---------|------------|-----|
| Direct port works, /app/ doesn't | Nginx config | Add `/app/{id}/` location block |
| Neither works | Podman ports | `podman port NAME` — verify mapping exists |
| Port mapped but refused | Container logs | App crashing internally — check logs |
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
## Container Keeps Dying
| Pattern | Cause | Fix |
|---------|-------|-----|
| Exits immediately (code 1) | Config error | Check `podman logs NAME` |
| Dies after minutes | OOM killed | Increase `--memory` limit |
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
| Crash loop | Repeated crash | Fix root cause, don't just restart |
## Network Issues
| Problem | Cause | Fix |
|---------|-------|-----|
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
| Container-to-container timeout | Different networks | Put both on same network |
## Capability Reference
| Capability | Apps That Need It | Failure Mode |
|-----------|------------------|-------------|
| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
## Read-Only Safe Apps
Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub
All others need writable root or will fail silently.

View File

@@ -0,0 +1,71 @@
# Archipelago Canonical Port Map
All port assignments across the 4 configuration layers. When adding or debugging an app, every row must be consistent across all columns.
## Bitcoin Stack
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| bitcoin-knots | 8332, 8333 | 8332, 8333 | archy-net | /app/bitcoin-knots/ | 8332→bitcoin-knots |
| bitcoin-ui | 8334 | 80 | bridge | /app/bitcoin-ui/ | 8334→bitcoin-knots |
| electrs | 50001 | 50001 | archy-net | /app/electrs/ | 50001→electrs |
| lnd | 9735, 10009, 8080 | 9735, 10009, 8080 | archy-net | /app/lnd/ | 10009→lnd |
| lnd-ui (RTL) | 8081 | 80 | bridge | /app/lnd-ui/ | 8081→lnd |
## Lightning & Payment
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| btcpay-server | 23000 | 49392 | archy-net | /app/btcpay/ | 23000→btcpay-server |
| nbxplorer | 24444 | 32838 | archy-net | N/A (internal) | N/A |
| fedimint | 8173, 8174, 8175 | 8173, 8174, 8175 | archy-net | /app/fedimint/ | 8174→fedimint |
| fedimint-gateway | 8175 | 8175 | archy-net | /app/fedimint-gateway/ | 8175→fedimint-gateway |
## Explorer & Monitoring
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| mempool | 4080 | 8080 | archy-net | /app/mempool/ | 4080→mempool |
| grafana | 3000 | 3000 | bridge | /app/grafana/ | 3000→grafana (new tab) |
## Self-Hosted Apps
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| nextcloud | 8085 | 80 | bridge | /app/nextcloud/ | 8085→nextcloud |
| vaultwarden | 8082 | 80 | bridge | /app/vaultwarden/ | 8082→vaultwarden (new tab) |
| filebrowser | 8083 | 80 | bridge | /app/filebrowser/ | 8083→filebrowser |
| searxng | 8888 | 8080 | bridge | /app/searxng/ | 8888→searxng |
| photoprism | 2342 | 2342 | bridge | /app/photoprism/ | 2342→photoprism (new tab) |
| jellyfin | 8096 | 8096 | bridge | /app/jellyfin/ | 8096→jellyfin |
| homeassistant | 8123 | 8123 | bridge | /app/homeassistant/ | 8123→homeassistant (new tab) |
| ollama | 11434 | 11434 | archy-net | /app/ollama/ | 11434→ollama |
| open-webui | 3080 | 8080 | archy-net | /app/open-webui/ | 3080→open-webui |
## Nostr & Social
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| nostr-rs-relay | 7000 | 8080 | archy-net | /app/nostr-rs-relay/ | 7000→nostr-rs-relay |
| indeedhub | 3001 | 3000 | archy-net | /app/indeedhub/ | 3001→indeedhub |
## System
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| tailscale | 8240 | 8240 | host | /app/tailscale/ | N/A |
| nginx-proxy-manager | 81, 8443 | 81, 443 | bridge | N/A | 81→nginx-proxy-manager |
## Multi-Container Stacks
**Immich**: immich-server (2283), immich-postgres (internal 5432), immich-redis (internal 6379) — all on immich-net
**Penpot**: penpot-frontend (9001→80), penpot-backend, penpot-exporter, penpot-postgres, penpot-mailcatch — all on penpot-net
**Mempool**: mempool (4080→8080), mempool-db (internal 3306) — on archy-net
**BTCPay**: btcpay-server (23000→49392), nbxplorer (24444→32838), btcpay-postgres (internal 5432) — on archy-net
## Key Notes
- **archy-net apps** resolve each other by container name (e.g., `bitcoin-knots:8332`)
- **bridge apps** are standalone — access services via host IP/port
- **host network** (tailscale only) — shares host namespace, no port mapping
- **New tab apps**: btcpay (23000), grafana (3000), vaultwarden (8082), photoprism (2342), homeassistant (8123) — X-Frame-Options blocks iframe

View File

@@ -0,0 +1,219 @@
---
name: podman-fix
description: >
Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
fix network connectivity, add missing restart policies, and resolve config drift.
Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
"app won't start", "fix podman", "repair container", "container down", or after /podman-doctor
identifies issues to fix.
allowed-tools: Bash Read Edit Write Glob Grep
---
# Podman Fix — Container Remediation
Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
## Fix Procedures
### Fix 1: Container Not Running
```bash
# Check why it stopped
sudo podman logs --tail 50 CONTAINER_NAME
sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
# If clean exit or crash — just restart
sudo podman start CONTAINER_NAME
# If corrupt state — remove and recreate
sudo podman rm -f CONTAINER_NAME
# Then recreate using the install flow (trigger from UI or re-run creation command)
```
**If container keeps crashing**: check logs for the actual error. Common causes:
- Missing config file → check if volume mount has the config
- Wrong permissions → `chown -R` the data directory
- Dependency not ready → start dependency first, wait, then start this container
### Fix 2: Missing Restart Policy
The most common uptime killer. Fix for ALL containers at once:
```bash
# Fix a single container
sudo podman update --restart unless-stopped CONTAINER_NAME
# Fix ALL containers that have no restart policy
for c in $(sudo podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
echo "Fixing restart policy for: $c"
sudo podman update --restart unless-stopped "$c"
fi
done
```
**Also update the Rust source** so new installs get it right:
- Check `core/archipelago/src/api/rpc/package.rs` `get_app_config()` for the app
- Ensure `--restart` flag is in the podman run args
### Fix 3: Port Mapping Issues
#### Port conflict (address already in use)
```bash
# Find what's using the port
sudo ss -tlnp | grep :PORT_NUMBER
# If it's another container, either change one's port or stop the conflicting one
sudo podman stop CONFLICTING_CONTAINER
# If it's a host process
sudo kill PID # or stop the service
```
#### Port not mapped (container running but port unreachable)
```bash
# Check current port mappings
sudo podman port CONTAINER_NAME
# Can't add ports to running container — must recreate
sudo podman stop CONTAINER_NAME
sudo podman rm CONTAINER_NAME
# Recreate with correct -p flags (use the Rust install flow or manual podman run)
```
#### Nginx proxy missing or wrong
Read and fix the nginx config:
- HTTP: `image-recipe/configs/nginx-archipelago.conf`
- HTTPS: `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`
Add a location block:
```nginx
location /app/APP_ID/ {
proxy_pass http://127.0.0.1:HOST_PORT/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Hide X-Frame-Options so it works in our iframe
proxy_hide_header X-Frame-Options;
proxy_hide_header Content-Security-Policy;
}
```
After editing nginx config, deploy and reload:
```bash
# On server
sudo nginx -t && sudo systemctl reload nginx
```
#### Frontend routing missing
Edit `neode-ui/src/stores/appLauncher.ts`:
- Add entry to `PORT_TO_APP_ID` map
- If app blocks iframes, add port to the new-tab list in `resolveAppIdFromUrl()`
### Fix 4: Network Issues
#### Container not on archy-net (can't resolve other containers)
```bash
# Connect to archy-net without recreating
sudo podman network connect archy-net CONTAINER_NAME
# Verify
sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
```
#### archy-net doesn't exist
```bash
sudo podman network create archy-net
# Then reconnect all containers that need it
```
#### DNS not working inside container
```bash
# Test DNS from inside container
sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots
# If DNS fails, recreate container with explicit DNS
# Add --dns 1.1.1.1 to the podman run command
```
### Fix 5: Health Check Issues
#### Add missing health check to running container
Can't add to running container — must recreate with health check flags:
```bash
# Example for a web app
sudo podman run ... \
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \
--health-interval 30s \
--health-timeout 5s \
--health-retries 3 \
--health-start-period 60s \
IMAGE
```
#### Fix unhealthy container
```bash
# See what the health check is actually running
sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
# Run the health check manually to see the error
sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
# Common fixes:
# - curl not installed in container → use wget or nc instead
# - Wrong port in health check → fix the check command
# - App takes too long to start → increase --health-start-period
```
### Fix 6: Permission/Capability Issues
```bash
# Check what capabilities container has
sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
# If missing required caps, must recreate with correct --cap-add flags
# Refer to the capability reference in /podman-doctor references
# Fix data directory permissions
sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/
```
### Fix 7: Full Config Consistency Fix
When port map is inconsistent across layers, fix ALL layers:
1. **Decide the correct port** (usually what's in package.rs)
2. **Fix Podman**: recreate container with correct `-p` flags
3. **Fix Nginx**: update location block's `proxy_pass` port
4. **Fix Frontend**: update `PORT_TO_APP_ID` in appLauncher.ts
5. **Deploy**: `./scripts/deploy-to-target.sh --live`
6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
## After Fixing
Always verify the fix:
```bash
# Container running?
sudo podman ps --filter name=CONTAINER_NAME
# Port reachable?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
# Via nginx proxy?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
# Health check passing?
sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
```
Run `/podman-doctor` again to confirm all issues are resolved.

View File

@@ -0,0 +1,309 @@
---
name: podman-uptime
description: >
Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
restart policies, creates health check monitors, and configures auto-recovery for all
containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
"watchdog", "container monitoring", "uptime guarantee", "keep containers running",
"survive reboot", or to harden container reliability.
allowed-tools: Bash Read Edit Write Glob Grep
---
# Podman Uptime — Container Reliability Guardian
Ensures every Archipelago container survives reboots, recovers from crashes, and stays healthy. Sets up the three layers of uptime defense: restart policies, systemd watchdog, and health-based auto-recovery.
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
## Layer 1: Restart Policies (Survive Reboots)
Every container MUST have `--restart unless-stopped`. This is non-negotiable.
### Audit and fix all containers
```bash
# Audit
for c in $(sudo podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
echo "$c: $policy"
done
# Fix any with "no" or empty policy
for c in $(sudo podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
echo "Fixing: $c"
sudo podman update --restart unless-stopped "$c"
fi
done
```
### Ensure podman auto-starts containers on boot
```bash
# Enable podman-restart service (restarts containers with restart policy on boot)
sudo systemctl enable podman-restart.service 2>/dev/null || true
# If podman-restart doesn't exist, create it
cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
[Unit]
Description=Podman Start All Containers With Restart Policy
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable podman-restart.service
```
## Layer 2: Systemd Watchdog (Detect and Recover)
Create a systemd timer that checks container health every 2 minutes and restarts unhealthy or stopped containers.
### Create the watchdog script
```bash
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
#!/bin/bash
# Archipelago Container Watchdog
# Checks all containers and restarts any that are stopped or unhealthy
LOG_TAG="container-watchdog"
# Restart any stopped containers that should be running (have restart policy)
for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do
logger -t "$LOG_TAG" "Restarting stopped container: $c"
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
done
# Restart unhealthy containers
for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do
logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG"
done
# Check for containers in "created" state (never started)
for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do
logger -t "$LOG_TAG" "Starting created container: $c"
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
done
SCRIPT
sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh
```
### Create the systemd timer
```bash
# Service unit
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
[Unit]
Description=Archipelago Container Watchdog
After=podman-restart.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
EOF
# Timer unit — runs every 2 minutes
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.timer
[Unit]
Description=Run Archipelago Container Watchdog every 2 minutes
[Timer]
OnBootSec=120
OnUnitActiveSec=120
AccuracySec=30
[Install]
WantedBy=timers.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now archipelago-watchdog.timer
```
### Verify watchdog is running
```bash
sudo systemctl status archipelago-watchdog.timer
sudo systemctl list-timers | grep archipelago
# Check watchdog logs
sudo journalctl -t container-watchdog --since "1 hour ago" --no-pager
```
## Layer 3: Dependency-Aware Startup Order
Some containers depend on others. The watchdog handles restarts, but initial boot order matters.
### Create ordered startup script
```bash
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
#!/bin/bash
# Ordered container startup for Archipelago
# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
LOG_TAG="ordered-start"
wait_for_container() {
local name=$1
local max_wait=${2:-60}
local waited=0
while [ $waited -lt $max_wait ]; do
status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
if [ "$status" = "true" ]; then
logger -t "$LOG_TAG" "$name is running"
return 0
fi
sleep 5
waited=$((waited + 5))
done
logger -t "$LOG_TAG" "WARNING: $name not running after ${max_wait}s"
return 1
}
# Tier 0: Infrastructure
logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
sudo podman start tailscale 2>/dev/null
# Tier 1: Bitcoin (foundation)
logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin"
sudo podman start bitcoin-knots 2>/dev/null
wait_for_container bitcoin-knots 120
# Tier 2: Bitcoin-dependent services
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent"
sudo podman start electrs 2>/dev/null
sudo podman start lnd 2>/dev/null
wait_for_container electrs 90
wait_for_container lnd 90
# Tier 3: Services depending on Tier 2
logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies"
sudo podman start mempool-db 2>/dev/null
sleep 5
sudo podman start mempool 2>/dev/null
sudo podman start nbxplorer 2>/dev/null
sleep 10
sudo podman start btcpay-server 2>/dev/null
sudo podman start btcpay-postgres 2>/dev/null
# Tier 4: Independent apps (start all remaining)
logger -t "$LOG_TAG" "Starting Tier 4: Independent apps"
sudo podman start --all 2>/dev/null
# Tier 5: UI containers (need parent apps running first)
logger -t "$LOG_TAG" "Starting Tier 5: UI containers"
sudo podman start bitcoin-ui 2>/dev/null
sudo podman start lnd-ui 2>/dev/null
logger -t "$LOG_TAG" "Startup sequence complete"
SCRIPT
sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh
```
### Wire into boot sequence
```bash
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
[Unit]
Description=Archipelago Ordered Container Startup
After=network-online.target podman.service
Wants=network-online.target
Before=archipelago.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/archipelago-ordered-start.sh
RemainAfterExit=yes
TimeoutStartSec=300
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable archipelago-containers.service
```
## Verification Checklist
After setting up all 3 layers, verify:
```bash
echo "=== Layer 1: Restart Policies ==="
for c in $(sudo podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
echo " $c: $policy"
done
echo ""
echo "=== Layer 2: Watchdog Timer ==="
sudo systemctl is-active archipelago-watchdog.timer
sudo systemctl list-timers | grep archipelago
echo ""
echo "=== Layer 3: Boot Services ==="
sudo systemctl is-enabled podman-restart.service 2>/dev/null || echo "podman-restart: not found"
sudo systemctl is-enabled archipelago-containers.service 2>/dev/null || echo "ordered-start: not found"
sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchdog: not found"
echo ""
echo "=== Container Health Summary ==="
total=$(sudo podman ps -a --format "{{.Names}}" | wc -l)
running=$(sudo podman ps --format "{{.Names}}" | wc -l)
stopped=$((total - running))
unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
```
## Reboot Test
The ultimate uptime test — reboot the server and verify everything comes back:
```bash
# Before reboot: record running containers
sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
# Reboot
sudo reboot
# After reboot (wait ~3 minutes, then SSH back in):
sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
# Compare
diff /tmp/before-reboot.txt /tmp/after-reboot.txt
# Should show no differences
```
## Monitoring
Check uptime status anytime:
```bash
# Quick status
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
# Watchdog activity
sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
# Container events (starts, stops, deaths)
sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
```
## Integration
- Run `/podman-doctor` first to identify issues
- Run `/podman-fix` for specific container repairs
- Run `/podman-uptime` to set up permanent reliability infrastructure
- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot