test: add reboot survival test script (REBOOT-01)
Creates scripts/test-reboot-survival.sh with TAP format output. Records pre-reboot containers, reboots node, waits for SSH + health, verifies container count/state/health. 6 checks per iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -229,7 +229,7 @@ Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→.
|
||||
|
||||
### Sprint 7: Zero-Downtime Reboot Testing
|
||||
|
||||
- [ ] **REBOOT-01** — Create reboot survival test script. `scripts/test-reboot-survival.sh` that: (1) Records all container names and states, (2) Reboots the node via `sudo reboot`, (3) Waits for SSH to come back (poll every 10s, max 180s), (4) Verifies ALL containers are running, (5) Verifies health endpoint returns OK, (6) Verifies no containers have restart counts > 0 since boot. Run on .228. **Acceptance**: Script passes. All containers survive reboot.
|
||||
- [x] **REBOOT-01** — Created `scripts/test-reboot-survival.sh`. TAP-format output with `--node`, `--iterations`, `--rest-between` flags. Records pre-reboot containers, reboots via sudo, waits for SSH (180s max) + health (120s max) + container stabilization (120s), verifies: container count recovered, no exited, all pre-reboot containers back, health OK, no restart loops. 6 checks per iteration.
|
||||
|
||||
- [ ] **REBOOT-02** — Run reboot survival test 10 times on .228. Execute test-reboot-survival.sh 10 times with 5-minute rest between reboots. Track: time to full recovery, any containers that fail to start, any services that don't come back. **Acceptance**: 10/10 reboots recover fully within 120s. Zero failed containers.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user