Files
archy/.claude/skills/podman-doctor/SKILL.md
Dorian 3682855668 fix: rootless UID mapping corrections + credential injection
- Correct off-by-one in UID mapping: container UID N → host UID
  (100000 + N - 1), not (100000 + N)
- Deploy script auto-fixes UID ownership on every deploy
- Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected
  from secrets at deploy time
- container rules updated for rootless podman architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:57:16 +00:00

10 KiB

name, description, allowed-tools
name description allowed-tools
podman-doctor Comprehensive Podman container diagnostic for Archipelago. Audits all running containers, port mappings, network connectivity, health status, restart policies, and config consistency across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing). Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "diagnose containers", "check podman", "why is app not working", "container health check", "port not reachable", "audit containers", "podman status", or when any container/app is misbehaving. Bash Read Glob Grep

Podman Doctor — Container Infrastructure Diagnostics

Systematic diagnostic for Archipelago's rootless Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, UID mapping issues, and config drift across all layers.

SSH command: ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228

ROOTLESS PODMAN: Archipelago runs Podman as the archipelago user (UID 1000), NOT root. Never use sudo podman — use plain podman after SSH'ing in as the archipelago user. Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).

If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.

Workflow

Step 1: Gather Runtime State

Run these on the server (as archipelago user — NO sudo):

# All containers with status, ports, networks
podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"

# Check for port conflicts on known ports
ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"

Step 2: Rootless Podman Health Check

Rootless Podman has specific requirements that must be verified:

# Verify running as archipelago user (NOT root)
whoami  # Must be "archipelago"
id      # Must show uid=1000(archipelago)

# Check XDG_RUNTIME_DIR is set (required for rootless podman socket)
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR"  # Must be /run/user/1000

# Verify subuid/subgid mapping exists
grep archipelago /etc/subuid  # Must show: archipelago:100000:65536
grep archipelago /etc/subgid  # Must show: archipelago:100000:65536

# Verify user lingering is enabled (keeps user services after logout)
ls /var/lib/systemd/linger/ | grep archipelago  # Must exist

# Check podman storage is accessible
podman info --format "{{.Store.GraphRoot}}"  # ~/.local/share/containers/storage
ls -la ~/.local/share/containers/storage/ 2>/dev/null || echo "ERROR: Storage not accessible"

# Check podman socket
ls -la /run/user/1000/podman/ 2>/dev/null || echo "WARNING: No podman socket directory"

Step 3: Check Restart Policies

Every container MUST have --restart unless-stopped. This is the #1 cause of downtime after reboots.

for c in $(podman ps -a --format "{{.Names}}"); do
  echo -n "$c: "
  podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
done

Red flag: no or empty = container won't survive reboot.

Step 4: Volume Ownership Audit (Rootless UID Mapping)

Rootless Podman maps container UIDs via subuid. Volume directories must be owned by the MAPPED UID, not the container UID. Formula: host_uid = 100000 + container_uid

echo "=== Volume Ownership Check ==="

# Default containers (run as root inside = UID 0 → host UID 100000)
for dir in lnd fedimint homeassistant jellyfin vaultwarden photoprism ollama filebrowser electrumx btcpay immich; do
  if [ -d "/var/lib/archipelago/$dir" ]; then
    owner=$(stat -c '%u:%g' "/var/lib/archipelago/$dir" 2>/dev/null)
    if [ "$owner" != "100000:100000" ]; then
      echo "WRONG: /var/lib/archipelago/$dir owned by $owner (should be 100000:100000)"
    else
      echo "  OK: $dir$owner"
    fi
  fi
done

# Bitcoin Knots (container UID 101 → host UID 100101)
if [ -d "/var/lib/archipelago/bitcoin" ]; then
  owner=$(stat -c '%u:%g' "/var/lib/archipelago/bitcoin")
  [ "$owner" != "100101:100101" ] && echo "WRONG: bitcoin owned by $owner (should be 100101:100101)" || echo "  OK: bitcoin → $owner"
fi

# PostgreSQL (container UID 70 → host UID 100070)
for dir in /var/lib/archipelago/*-db /var/lib/archipelago/postgres-*; do
  if [ -d "$dir" ]; then
    owner=$(stat -c '%u:%g' "$dir")
    [ "$owner" != "100070:100070" ] && echo "WRONG: $dir owned by $owner (should be 100070:100070)" || echo "  OK: $(basename $dir)$owner"
  fi
done

# Grafana (container UID 472 → host UID 100472)
if [ -d "/var/lib/archipelago/grafana" ]; then
  owner=$(stat -c '%u:%g' "/var/lib/archipelago/grafana")
  [ "$owner" != "100472:100472" ] && echo "WRONG: grafana owned by $owner (should be 100472:100472)" || echo "  OK: grafana → $owner"
fi

# MariaDB/MySQL (container UID 999 → host UID 100999)
if [ -d "/var/lib/archipelago/mysql-mempool" ]; then
  owner=$(stat -c '%u:%g' "/var/lib/archipelago/mysql-mempool")
  [ "$owner" != "100999:100999" ] && echo "WRONG: mysql-mempool owned by $owner (should be 100999:100999)" || echo "  OK: mysql-mempool → $owner"
fi

Step 5: Verify Port Mapping Consistency

Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:

Layer 1 — Backend Config (Rust): Read core/archipelago/src/api/rpc/package.rs, look at get_app_config() port mappings.

Layer 2 — Podman Runtime: podman ps --format "{{.Names}}: {{.Ports}}"

Layer 3 — Nginx Proxy: Read these for /app/{id}/ location blocks:

  • image-recipe/configs/nginx-archipelago.conf (HTTP)
  • image-recipe/configs/snippets/archipelago-https-app-proxies.conf (HTTPS)

Layer 4 — Frontend Routing: Read neode-ui/src/stores/appLauncher.tsPORT_TO_APP_ID map.

Symptom Root Cause
App iframe shows 502/504 Nginx proxies to wrong port, or container not running
App loads wrong content Port collision — two containers on same host port
Works on port but not /app/ path Missing nginx location block
Frontend can't find app PORT_TO_APP_ID missing in appLauncher.ts

Step 6: Network Connectivity Audit

# Networks and their containers
podman network ls
podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"

# Check container subnet (rootless uses 10.89.x.x, NOT 10.88.x.x)
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" 2>/dev/null

Must be on archy-net: bitcoin-knots, lnd, electrs/electrumx, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui

Must NOT be on archy-net: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)

Step 7: UFW Forward Policy Check

Rootless Podman requires DEFAULT_FORWARD_POLICY="ACCEPT" in UFW, otherwise container ports are unreachable from LAN.

grep DEFAULT_FORWARD_POLICY /etc/default/ufw
# Must be "ACCEPT", NOT "DROP"
# If DROP: containers work locally but NOT from other machines on the network

Step 8: Systemd Service Sandbox Check

The archipelago.service must have specific settings relaxed for rootless Podman:

# Check critical settings
systemctl cat archipelago.service | grep -E "ProtectHome|PrivateTmp|RestrictNamespaces|ReadWritePaths|XDG_RUNTIME_DIR"

Required settings for rootless Podman:

  • ProtectHome=no — podman stores images in ~/.local/share/containers/
  • PrivateTmp=no or disabled — podman runtime uses /tmp/podman-run-1000/
  • RestrictNamespaces= must NOT be set — rootless podman needs user namespaces
  • ReadWritePaths= must include /var/lib/archipelago /run/user /tmp
  • Environment=XDG_RUNTIME_DIR=/run/user/1000

Step 9: Health Check Status

# Containers with health checks — are they passing?
for c in $(podman ps --format "{{.Names}}"); do
  health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
  if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
    echo "$c: $health"
  fi
done

# Containers WITHOUT health checks (gap in monitoring)
for c in $(podman ps --format "{{.Names}}"); do
  hc=$(podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
  if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
    echo "NO HEALTHCHECK: $c"
  fi
done

Step 10: Resource & Failure Analysis

# Resource usage
podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"

# Recent deaths (last 24h)
podman events --filter event=died --since 24h 2>/dev/null | tail -20

# OOM kills
podman ps -a --format "{{.Names}}" | while read c; do
  oom=$(podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
  [ "$oom" = "true" ] && echo "OOM KILLED: $c"
done

# Non-zero exits
podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"

Step 11: Systemd Integration

systemctl is-active archipelago nginx
systemctl --user list-units --type=service 2>/dev/null | grep -i podman
systemctl list-timers --all | grep -i -E "podman|container|archipelago"

Step 12: Generate Report

Produce a structured report:

## Container Diagnostic Report

### Rootless Podman Status
- User: archipelago (UID 1000)
- Subuid mapping: [OK/MISSING]
- XDG_RUNTIME_DIR: [OK/MISSING]
- User linger: [enabled/disabled]
- UFW forward policy: [ACCEPT/DROP]

### Summary
- Total containers: X running, Y stopped, Z unhealthy
- Port conflicts: [list or "none"]
- Missing restart policies: [list or "none"]
- Network issues: [list or "none"]
- UID mapping issues: [list or "none"]
- Health check gaps: [list]

### Critical Issues (fix immediately)
1. ...

### Warnings (fix soon)
1. ...

### Recommended Actions
1. ...

After diagnosis, suggest running /podman-fix for any issues found.

Port Reference

See references/port-map.md for the canonical port assignment table across all 4 layers.

UID Mapping Reference

See references/uid-mapping.md for the complete rootless UID mapping table.