diff --git a/loop/plan.md b/loop/plan.md index bd6a79e4..d31e7833 100644 --- a/loop/plan.md +++ b/loop/plan.md @@ -235,7 +235,7 @@ Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→. - [ ] **REBOOT-03** — (BLOCKED: .198 crash recovery takes >120s for 34 containers — health timeout exceeded on all 3 reboot iterations. SSH returns in 125-145s but backend startup blocked by sequential container recovery. Needs CONT-02 deployment to .198 and/or increased health wait timeout. 3/6 checks passed — SSH comes back reliably.) -- [ ] **REBOOT-04** — Test simultaneous reboot of both nodes. Reboot .228 and .198 at the same time. After both recover, verify: federation re-establishes, DWN sync works, file sharing works. **Acceptance**: Both nodes fully recover. Federation sync succeeds within 10 minutes of both being back. +- [ ] **REBOOT-04** — (BLOCKED: Simultaneous reboot test — .228 recovered in 120s but .198 SSH timed out after 300s. .198 has recurring slow-boot issue with 34 containers on 8GB RAM. .228 passed its half of the test.) - [x] **REBOOT-05** — SIGKILL recovery test. .228: 5/5 pass, recovery in 10-15s. .198: 4/5 pass (first failed due to prior crash recovery still running, subsequent 4 recovered in 5s). Backend auto-restarts via systemd Restart=on-failure. With PERF-01 background recovery, health endpoint available within seconds of restart.