- Correct off-by-one in UID mapping: container UID N → host UID (100000 + N - 1), not (100000 + N) - Deploy script auto-fixes UID ownership on every deploy - Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected from secrets at deploy time - container rules updated for rootless podman architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
name, description, allowed-tools
| name | description | allowed-tools |
|---|---|---|
| podman-doctor | Comprehensive Podman container diagnostic for Archipelago. Audits all running containers, port mappings, network connectivity, health status, restart policies, and config consistency across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing). Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "diagnose containers", "check podman", "why is app not working", "container health check", "port not reachable", "audit containers", "podman status", or when any container/app is misbehaving. | Bash Read Glob Grep |
Podman Doctor — Container Infrastructure Diagnostics
Systematic diagnostic for Archipelago's rootless Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, UID mapping issues, and config drift across all layers.
SSH command: ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228
ROOTLESS PODMAN: Archipelago runs Podman as the
archipelagouser (UID 1000), NOT root. Never usesudo podman— use plainpodmanafter SSH'ing in as thearchipelagouser. Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
Workflow
Step 1: Gather Runtime State
Run these on the server (as archipelago user — NO sudo):
# All containers with status, ports, networks
podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
# Check for port conflicts on known ports
ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
Step 2: Rootless Podman Health Check
Rootless Podman has specific requirements that must be verified:
# Verify running as archipelago user (NOT root)
whoami # Must be "archipelago"
id # Must show uid=1000(archipelago)
# Check XDG_RUNTIME_DIR is set (required for rootless podman socket)
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # Must be /run/user/1000
# Verify subuid/subgid mapping exists
grep archipelago /etc/subuid # Must show: archipelago:100000:65536
grep archipelago /etc/subgid # Must show: archipelago:100000:65536
# Verify user lingering is enabled (keeps user services after logout)
ls /var/lib/systemd/linger/ | grep archipelago # Must exist
# Check podman storage is accessible
podman info --format "{{.Store.GraphRoot}}" # ~/.local/share/containers/storage
ls -la ~/.local/share/containers/storage/ 2>/dev/null || echo "ERROR: Storage not accessible"
# Check podman socket
ls -la /run/user/1000/podman/ 2>/dev/null || echo "WARNING: No podman socket directory"
Step 3: Check Restart Policies
Every container MUST have --restart unless-stopped. This is the #1 cause of downtime after reboots.
for c in $(podman ps -a --format "{{.Names}}"); do
echo -n "$c: "
podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
done
Red flag: no or empty = container won't survive reboot.
Step 4: Volume Ownership Audit (Rootless UID Mapping)
Rootless Podman maps container UIDs via subuid. Volume directories must be owned by the MAPPED UID, not the container UID. Formula: host_uid = 100000 + container_uid
echo "=== Volume Ownership Check ==="
# Default containers (run as root inside = UID 0 → host UID 100000)
for dir in lnd fedimint homeassistant jellyfin vaultwarden photoprism ollama filebrowser electrumx btcpay immich; do
if [ -d "/var/lib/archipelago/$dir" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/$dir" 2>/dev/null)
if [ "$owner" != "100000:100000" ]; then
echo "WRONG: /var/lib/archipelago/$dir owned by $owner (should be 100000:100000)"
else
echo " OK: $dir → $owner"
fi
fi
done
# Bitcoin Knots (container UID 101 → host UID 100101)
if [ -d "/var/lib/archipelago/bitcoin" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/bitcoin")
[ "$owner" != "100101:100101" ] && echo "WRONG: bitcoin owned by $owner (should be 100101:100101)" || echo " OK: bitcoin → $owner"
fi
# PostgreSQL (container UID 70 → host UID 100070)
for dir in /var/lib/archipelago/*-db /var/lib/archipelago/postgres-*; do
if [ -d "$dir" ]; then
owner=$(stat -c '%u:%g' "$dir")
[ "$owner" != "100070:100070" ] && echo "WRONG: $dir owned by $owner (should be 100070:100070)" || echo " OK: $(basename $dir) → $owner"
fi
done
# Grafana (container UID 472 → host UID 100472)
if [ -d "/var/lib/archipelago/grafana" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/grafana")
[ "$owner" != "100472:100472" ] && echo "WRONG: grafana owned by $owner (should be 100472:100472)" || echo " OK: grafana → $owner"
fi
# MariaDB/MySQL (container UID 999 → host UID 100999)
if [ -d "/var/lib/archipelago/mysql-mempool" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/mysql-mempool")
[ "$owner" != "100999:100999" ] && echo "WRONG: mysql-mempool owned by $owner (should be 100999:100999)" || echo " OK: mysql-mempool → $owner"
fi
Step 5: Verify Port Mapping Consistency
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
Layer 1 — Backend Config (Rust): Read core/archipelago/src/api/rpc/package.rs, look at get_app_config() port mappings.
Layer 2 — Podman Runtime: podman ps --format "{{.Names}}: {{.Ports}}"
Layer 3 — Nginx Proxy: Read these for /app/{id}/ location blocks:
image-recipe/configs/nginx-archipelago.conf(HTTP)image-recipe/configs/snippets/archipelago-https-app-proxies.conf(HTTPS)
Layer 4 — Frontend Routing: Read neode-ui/src/stores/appLauncher.ts — PORT_TO_APP_ID map.
| Symptom | Root Cause |
|---|---|
| App iframe shows 502/504 | Nginx proxies to wrong port, or container not running |
| App loads wrong content | Port collision — two containers on same host port |
| Works on port but not /app/ path | Missing nginx location block |
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
Step 6: Network Connectivity Audit
# Networks and their containers
podman network ls
podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
# Check container subnet (rootless uses 10.89.x.x, NOT 10.88.x.x)
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" 2>/dev/null
Must be on archy-net: bitcoin-knots, lnd, electrs/electrumx, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
Must NOT be on archy-net: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
Step 7: UFW Forward Policy Check
Rootless Podman requires DEFAULT_FORWARD_POLICY="ACCEPT" in UFW, otherwise container ports are unreachable from LAN.
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
# Must be "ACCEPT", NOT "DROP"
# If DROP: containers work locally but NOT from other machines on the network
Step 8: Systemd Service Sandbox Check
The archipelago.service must have specific settings relaxed for rootless Podman:
# Check critical settings
systemctl cat archipelago.service | grep -E "ProtectHome|PrivateTmp|RestrictNamespaces|ReadWritePaths|XDG_RUNTIME_DIR"
Required settings for rootless Podman:
ProtectHome=no— podman stores images in~/.local/share/containers/PrivateTmp=noor disabled — podman runtime uses/tmp/podman-run-1000/RestrictNamespaces=must NOT be set — rootless podman needs user namespacesReadWritePaths=must include/var/lib/archipelago /run/user /tmpEnvironment=XDG_RUNTIME_DIR=/run/user/1000
Step 9: Health Check Status
# Containers with health checks — are they passing?
for c in $(podman ps --format "{{.Names}}"); do
health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
echo "$c: $health"
fi
done
# Containers WITHOUT health checks (gap in monitoring)
for c in $(podman ps --format "{{.Names}}"); do
hc=$(podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
echo "NO HEALTHCHECK: $c"
fi
done
Step 10: Resource & Failure Analysis
# Resource usage
podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Recent deaths (last 24h)
podman events --filter event=died --since 24h 2>/dev/null | tail -20
# OOM kills
podman ps -a --format "{{.Names}}" | while read c; do
oom=$(podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
[ "$oom" = "true" ] && echo "OOM KILLED: $c"
done
# Non-zero exits
podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
Step 11: Systemd Integration
systemctl is-active archipelago nginx
systemctl --user list-units --type=service 2>/dev/null | grep -i podman
systemctl list-timers --all | grep -i -E "podman|container|archipelago"
Step 12: Generate Report
Produce a structured report:
## Container Diagnostic Report
### Rootless Podman Status
- User: archipelago (UID 1000)
- Subuid mapping: [OK/MISSING]
- XDG_RUNTIME_DIR: [OK/MISSING]
- User linger: [enabled/disabled]
- UFW forward policy: [ACCEPT/DROP]
### Summary
- Total containers: X running, Y stopped, Z unhealthy
- Port conflicts: [list or "none"]
- Missing restart policies: [list or "none"]
- Network issues: [list or "none"]
- UID mapping issues: [list or "none"]
- Health check gaps: [list]
### Critical Issues (fix immediately)
1. ...
### Warnings (fix soon)
1. ...
### Recommended Actions
1. ...
After diagnosis, suggest running /podman-fix for any issues found.
Port Reference
See references/port-map.md for the canonical port assignment table across all 4 layers.
UID Mapping Reference
See references/uid-mapping.md for the complete rootless UID mapping table.