Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.4 KiB
2.4 KiB
You are working through an overnight automation plan for the Archipelago (archy) project. Read these files first:
loop/plan.md-- Your task checklist (mark items- [x]as you complete them)CLAUDE.md-- Project conventions, architecture, and coding standards
Working Process
For each task in loop/plan.md:
- Find the first unchecked
- [ ]item - Read the task description carefully
- Read the relevant source files before making changes
- Implement following CLAUDE.md conventions
- Run any test/build commands specified in the task
- Fix all errors before continuing
- Commit with conventional format:
type: description - Mark it done
- [x]inloop/plan.md - Move to the next unchecked task immediately
Critical Rules
- Deploy-test-fix LOOPS: Many tasks require you to deploy, test, find failures, fix them, redeploy, and retest. Do NOT mark a task complete until ALL tests in that task pass. If a fix introduces a new failure, fix that too. Keep looping.
- Read logs obsessively: After every deploy, read
journalctl,podman logs, and curl output. The logs tell you what's broken. - Fix the root cause: Don't patch symptoms. If a container won't restart, find out WHY (wrong restart policy? health check failing? missing dependency?) and fix the actual cause.
- Never skip a testing gate -- if tests fail, fix before moving on
- If a task is proving difficult, make at least 10 genuine attempts before moving on
- Always read source files before editing them
- Do not stop until all tasks are checked or you are rate limited
- Commit after each completed fix (multiple commits per task is fine)
- DO NOT PUSH -- a CI build is in progress, we will push manually later
- Deploy to .228 --
ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228 - Run Rust builds/checks on .228, NOT macOS
- Production-quality code only -- no shortcuts, no TODO comments, no unwrap()
SSH Quick Reference
SSH="ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228"
# Deploy from macOS:
./scripts/deploy-to-target.sh --target 192.168.1.228
# Build Rust on .228:
$SSH "cd ~/archy/core && cargo clippy --all-targets --all-features && cargo test --all-features"
# Check containers:
$SSH "podman ps -a --format '{{.Names}} {{.State}} {{.Status}}' | sort"
# Read container logs:
$SSH "podman logs bitcoin-knots --tail 30"
# Check backend:
$SSH "journalctl -u archipelago --no-pager -n 50"