patches on sxsw ai working api key working container hardened plus many more
This commit is contained in:
139
loop/plan.md
139
loop/plan.md
@@ -457,6 +457,143 @@
|
||||
|
||||
---
|
||||
|
||||
## Post-v1.0 Feature Release: Multi-Node, Identity & Tor (April -- September 2026)
|
||||
|
||||
**Goal**: Ship 8 features to production — real Nostr identity, NIP-07 signing, multi-node federation across 7 servers, file sharing, DWN + node map, webhook fix, Tor rotation. Each feature must work on first install with 100% uptime.
|
||||
|
||||
**Servers**: 192.168.1.228 (primary), 192.168.1.198 (secondary), archipelago-2.tail2b6225.ts.net, archipelago-3.tail2b6225.ts.net, + 3 more TBD
|
||||
**Deploy**: `./scripts/deploy-to-target.sh --live` | SSH: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||
|
||||
---
|
||||
|
||||
### Sprint 40: Critical Fix & Identity Completion (April 2026 Week 1-2)
|
||||
|
||||
- [ ] **WHFIX-01** — Decouple health monitor from webhook config. In `core/archipelago/src/health_monitor.rs` lines 150-156, the health check loop skips ALL monitoring (restarts + WebSocket notifications) when webhooks are disabled or ContainerCrash isn't subscribed. This means fresh installs (webhooks disabled by default) get NO auto-restart and NO UI notifications. Fix: remove the webhook config gate from the main loop. Health checks, auto-restarts, and WebSocket `Notification` pushes must run unconditionally. Move the webhook gate into a separate block that only controls external HTTP webhook delivery — call `webhooks::send_webhook()` only when enabled AND the event is subscribed. Keep the existing `send_webhook()` function which already checks `config.enabled` and `config.events.contains()` internally. **Acceptance**: With webhooks disabled (default), crash a container (`sudo podman stop archy-filebrowser`), confirm health monitor detects it within 60s, auto-restarts it, and pushes a Notification visible in the Dashboard toast. With webhooks enabled + URL configured, confirm HTTP POST is also sent. Deploy and verify on 192.168.1.228.
|
||||
|
||||
- [ ] **WHFIX-02** — Add monitoring.rs webhook integration. In `core/archipelago/src/monitoring/mod.rs`, the alert system pushes `Notification` to DataModel but never calls `webhooks::send_webhook()`. Add webhook delivery for fired alerts: when a `DiskWarning` alert fires, send `WebhookEvent::DiskWarning`; when `ContainerCrash` fires, send `WebhookEvent::ContainerCrash`. Map alert types to webhook events. The webhook call should be fire-and-forget (already is in `send_webhook`). **Acceptance**: Configure a webhook URL, trigger a disk warning (lower threshold temporarily to 1%), confirm HTTP POST received. Deploy and verify.
|
||||
|
||||
- [ ] **IDENT-01** — Auto-generate Nostr keypair during identity creation. In `core/archipelago/src/identity_manager.rs` `create()` method, after generating the Ed25519 keypair, immediately call `create_nostr_key()` on the same identity so every identity gets both Ed25519 (DID) and secp256k1 (Nostr) keys from creation. Update the `IdentityInfo` struct returned by `identity.create` and `identity.list` RPC to always include `nostr_pubkey` (hex) and `nostr_npub` (bech32) fields when present. **Acceptance**: Call `identity.create`, then `identity.get` — response includes both `did` and `nostr_npub`. Deploy and verify.
|
||||
|
||||
- [ ] **IDENT-02** — Update onboarding to show DID + npub. In `neode-ui/src/views/OnboardingDid.vue`, after fetching the node DID, also fetch `node.nostr-pubkey` (already exists as RPC endpoint). Display both: "Your DID: did:key:z..." and "Your Nostr ID: npub1..." with copy buttons for each. Add a brief explanation: DID for Web5/federation, npub for Nostr apps. Store `nostr_npub` in localStorage alongside `neode_did`. **Acceptance**: Fresh onboarding flow shows both DID and npub on the identity screen. Deploy and verify at http://192.168.1.228.
|
||||
|
||||
- [ ] **IDENT-03** — Wire real signature verification in onboarding. In `neode-ui/src/views/OnboardingVerify.vue`, replace `generateMockSignature()` with a real call to `rpcClient.signChallenge(challenge)`. Generate a random challenge string, send it to the backend, display the real Ed25519 signature. Add a "Verify" button that calls `identity.verify` with the DID, challenge, and signature to prove the node controls its keys. Show green checkmark on success. **Acceptance**: Onboarding verify step shows real cryptographic signature and verification succeeds. Deploy and verify.
|
||||
|
||||
- [ ] **IDENT-04** — Wire real encrypted backup in onboarding. In `neode-ui/src/views/OnboardingBackup.vue`, replace the mock JSON display with a real call to `rpcClient.createBackup(passphrase)`. Add a passphrase input field (with confirmation). Call `backup.create` RPC, then offer the encrypted backup blob as a downloadable file. Show the backup metadata (DID, timestamp, encrypted: true). **Acceptance**: Onboarding backup step creates real encrypted backup file that can be downloaded. Deploy and verify.
|
||||
|
||||
### Sprint 41: NIP-07 Iframe Signing (April 2026 Week 3-4)
|
||||
|
||||
- [ ] **NIP07-01** — Configure nginx to inject nostr-provider.js into iframe apps. In `image-recipe/configs/nginx-archipelago.conf`, for every `/app/*` proxy location block, add `sub_filter '</head>' '<script src="/nostr-provider.js"></script></head>';` and `sub_filter_once on;`. Ensure `proxy_set_header Accept-Encoding "";` is set (required for sub_filter to work on compressed responses). Copy `neode-ui/public/nostr-provider.js` to `/opt/archipelago/web-ui/nostr-provider.js` in the deploy script. Also add this to the HTTPS snippets conf at `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`. **Acceptance**: Open any iframe app (e.g., Mempool at `/app/mempool/`), open browser DevTools console, type `window.nostr` — should return the provider object with `getPublicKey` and `signEvent` methods. Deploy and verify.
|
||||
|
||||
- [ ] **NIP07-02** — Add signing consent modal. In `neode-ui/src/components/`, create `NostrSignConsent.vue` — a modal that shows when an iframe app requests a Nostr signature. Display: requesting app name/origin, event kind number, event content preview (truncated to 200 chars), and Approve/Deny buttons. In `neode-ui/src/stores/appLauncher.ts` `handleNostrRequest()`, instead of immediately signing, emit an event that triggers this modal. Only call the backend RPC after user approves. Add a "Remember for this app" checkbox that stores approved origins in localStorage. **Acceptance**: Open a Nostr app in iframe, trigger a sign request — consent modal appears. Approve → signature returned. Deny → error returned to iframe. Deploy and verify.
|
||||
|
||||
- [ ] **NIP07-03** — Test NIP-07 with a real Nostr web app. Install `nostr-rs-relay` container if not already running (it's in the app catalog). Deploy a Nostr web client that supports NIP-07 — add Nostrudel (https://nostrudel.ninja) as a web-only app entry in `Marketplace.vue` `getCuratedAppList()` (category: "Social", opens in iframe). Open Nostrudel, verify it detects `window.nostr`, can fetch the pubkey, and can sign events (post a note). **Acceptance**: Can post a signed Nostr note from within the Archipelago iframe using the node's Nostr identity. Verify the note appears on a public Nostr client.
|
||||
|
||||
- [ ] **NIP07-04** — Support NIP-04 and NIP-44 encryption in iframe provider. The `nostr-provider.js` already has stubs for `nip04.encrypt`, `nip04.decrypt`, `nip44.encrypt`, `nip44.decrypt`. Add backend RPC endpoints: `identity.nostr-encrypt-nip04`, `identity.nostr-decrypt-nip04`, `identity.nostr-encrypt-nip44`, `identity.nostr-decrypt-nip44`. Each takes the identity ID, peer pubkey, and plaintext/ciphertext. Use `nostr_sdk` for the actual crypto. Register in RPC router. Wire the appLauncher `handleNostrRequest` to route `nip04.*` and `nip44.*` calls to these endpoints. **Acceptance**: From an iframe app, call `window.nostr.nip44.encrypt(peerPubkey, "hello")` — returns ciphertext. Call `nip44.decrypt` with same ciphertext — returns "hello". Deploy and verify.
|
||||
|
||||
### Sprint 42: Tor Address Rotation & Per-App Toggle (May 2026 Week 1-2)
|
||||
|
||||
- [ ] **TOR-01** — Implement Tor address rotation RPC. In `core/archipelago/src/api/rpc/tor.rs`, add `tor.rotate-service` handler. Flow: (1) Read current service from `services.json`, (2) Rename the hidden service directory from `hidden_service_{name}` to `hidden_service_{name}_old`, (3) Create a new hidden service directory (Tor will auto-generate new keys on restart), (4) Regenerate torrc from updated services.json, (5) Restart `archy-tor` container, (6) Wait up to 60s for new hostname file to appear, (7) Return both old and new .onion addresses. Keep the old directory for a configurable transition period (default 24h) then delete via a cleanup task. Add `tor.cleanup-rotated` RPC that deletes expired old service directories. **Acceptance**: Call `tor.rotate-service("archipelago")`, verify new .onion address is different from old one. Both addresses resolve during transition period. After cleanup, old address stops working. Deploy and verify.
|
||||
|
||||
- [ ] **TOR-02** — Propagate Tor address change to federation peers. After a successful rotation in `tor.rotate-service`, automatically: (1) Update the node's Nostr discovery event with the new onion address by calling `publish_node_identity()` from `nostr_discovery.rs`, (2) For each federated peer in `federation.rs`, send a `federation.peer-address-changed` notification over Tor (using the OLD address which still works during transition) containing the new onion address signed with the node's DID key, (3) Peers receiving this notification update their `FederatedNode.onion` field and re-save `federation/nodes.json`. Add `federation.peer-address-changed` as a new inter-node RPC handler. **Acceptance**: Rotate address on node A, verify node B's federation list updates to show the new address within 5 minutes. Verify Nostr relay shows new address.
|
||||
|
||||
- [ ] **TOR-03** — Add per-app Tor toggle. In `core/archipelago/src/api/rpc/tor.rs`, add `tor.toggle-app` handler that takes `app_id` and `enabled` (bool). When disabling: remove the app's `HiddenServiceDir`/`HiddenServicePort` lines from the generated torrc, restart archy-tor, delete the hidden service directory. When enabling: add the service entry to `services.json`, regenerate torrc, restart archy-tor, wait for hostname. Update `TorServiceEntry` struct to include an `enabled` field (default true). The `tor.list-services` response should include the `enabled` state per service. **Acceptance**: Disable Tor for filebrowser, verify its .onion address no longer resolves. Re-enable, verify a new .onion address is generated and works. Deploy and verify.
|
||||
|
||||
- [ ] **TOR-04** — Add Tor management UI. In `neode-ui/src/views/AppDetails.vue`, add a "Tor Access" section (only shown when the app has a Tor service). Show: current .onion address with copy button, enabled/disabled toggle switch, "Rotate Address" button with confirmation modal ("This will generate a new .onion address. The old address will work for 24 hours during transition. Federated peers will be notified automatically."). In `neode-ui/src/views/Settings.vue` or `Web5.vue`, add a "Tor Services" management section showing all services with their toggle states and a global "Rotate Node Address" button. Wire to `tor.toggle-app`, `tor.rotate-service`, `tor.list-services` RPC calls. **Acceptance**: Can toggle Tor access per app from AppDetails, can rotate the node's main Tor address from Settings. All state changes reflected in UI immediately. Deploy and verify.
|
||||
|
||||
### Sprint 43: Multi-Node Federation Deployment (May 2026 Week 3-4)
|
||||
|
||||
- [ ] **FED-DEPLOY-01** — Deploy latest code to all available servers. Run `./scripts/deploy-to-target.sh --live` for primary (192.168.1.228). Run `ARCHIPELAGO_TARGET="archipelago@192.168.1.198" ./scripts/deploy-to-target.sh --live` for secondary. Run `ARCHIPELAGO_TARGET="archipelago@archipelago-2.tail2b6225.ts.net" ./scripts/deploy-to-target.sh --live` for Tailscale server 2. For archipelago-3 (no build tools): SCP binary from archipelago-2, upload frontend tarball, extract to `/opt/archipelago/web-ui/`, restart service. Verify health on all 4 servers via `curl http://<server>/health`. **Acceptance**: All 4 servers return healthy, frontend loads, can log in on each.
|
||||
|
||||
- [ ] **FED-DEPLOY-02** — Federate all servers. On primary (192.168.1.228), call `federation.invite` to generate 3 invite codes (one per peer server). On each secondary server, call `federation.join` with the invite code. Verify bilateral trust is established: `federation.list-nodes` on primary shows all 3 peers as "trusted", each peer shows primary as "trusted". Trigger `federation.sync-state` and verify state snapshots contain real data (CPU, memory, disk, app list) from each peer. **Acceptance**: `federation.list-nodes` on any server lists all 4 nodes with recent `last_seen` timestamps and valid state snapshots.
|
||||
|
||||
- [ ] **FED-DEPLOY-03** — Validate Nostr discovery across all nodes. On each server, call `node.nostr-publish` to publish identity to relays. Wait 30 seconds for relay propagation. On each server, call `node.nostr-discover` — verify it finds all other 3 nodes (DID, onion address, version). If discovery fails: check relay connectivity (are relays reachable from server?), check Tor proxy routing, check NIP-33 event format. Fix any issues. **Acceptance**: Every server can discover every other server via Nostr relays. Run discovery 3 times from each to confirm reliability.
|
||||
|
||||
- [ ] **FED-DEPLOY-04** — Test federation resilience. (1) Stop the backend on one server (`sudo systemctl stop archipelago`), verify other servers detect it as offline within 5 minutes (federation sync fails, `last_seen` goes stale). (2) Restart the server, verify it reconnects and state syncs resume within 5 minutes. (3) Kill the `archy-tor` container on one server, verify federation detects `tor_active: false` in state snapshot. (4) Restart Tor, verify it recovers. (5) Simulate network partition by blocking port 9050 on one server with iptables, verify graceful degradation, then unblock. **Acceptance**: All 5 scenarios recover automatically without manual intervention. Document recovery times.
|
||||
|
||||
### Sprint 44: File Sharing Across Nodes (June 2026 Week 1-2)
|
||||
|
||||
- [ ] **SHARE-01** — Test content sharing between two federated nodes. On node A (192.168.1.228): upload a test file to FileBrowser, then call `content.add` with the filename to share it. Call `content.set-pricing` with `access: "free"`. Call `content.set-availability` with `availability: "all_peers"`. On node B (192.168.1.198): call `content.browse-peer` with node A's onion address. Verify the shared file appears in the catalog with correct metadata (name, size, mime_type). Download the file via the content server's HTTP endpoint over Tor. Compare checksums. **Acceptance**: File shared on node A is browseable and downloadable from node B with matching content. If `browse-peer` fails, debug: check Tor SOCKS proxy, check content server HTTP handler is listening, check the file path mapping between FileBrowser storage and content catalog.
|
||||
|
||||
- [ ] **SHARE-02** — Test access control modes. On node A, share 3 files: one `free`, one `peers_only`, one `paid` (price: 100 sats). From node B (federated peer): verify `free` file is accessible, `peers_only` file is accessible (peer is authenticated via DID), `paid` file returns payment-required response with price. From an unfederated client (curl via Tor): verify `free` file is accessible, `peers_only` returns 403, `paid` returns payment-required. Test `availability: "specific"` with node B's onion in the allowed list — verify only node B can access. **Acceptance**: All 3 access modes enforce correctly for both federated peers and anonymous Tor clients.
|
||||
|
||||
- [ ] **SHARE-03** — Test file sharing at scale. Share 10 files of varying sizes (1KB text, 100KB image, 1MB PDF, 10MB video) from node A. Browse the catalog from nodes B, C, and D simultaneously. Download the 10MB file from all 3 nodes at once. Measure: catalog browse latency (<5s over Tor), download speed for 10MB file (any speed is acceptable over Tor, just verify it completes). Verify no corrupted transfers (checksum all downloads). **Acceptance**: All files transfer correctly to all 3 peers. No timeouts, no corruption. Document transfer speeds.
|
||||
|
||||
- [ ] **SHARE-04** — Add peer content browsing to Cloud UI. In `neode-ui/src/views/Cloud.vue`, add a "Peer Files" tab alongside Photos/Music/Documents/All Files. This tab shows a list of federated peers (from `federation.list-nodes`). Clicking a peer calls `content.browse-peer` with their onion address and displays their shared catalog in the same FileGrid component. Add a download button on each file that fetches the content over Tor and saves locally. Show loading state while Tor connection establishes (can take 5-10s). **Acceptance**: Can browse and download peer-shared files from the Cloud page. Deploy and verify.
|
||||
|
||||
### Sprint 45: DWN Multi-Node Sync (June 2026 Week 3-4)
|
||||
|
||||
- [ ] **DWN-SYNC-01** — Test DWN sync between federated nodes. On node A: register a protocol via `dwn.register-protocol` (e.g., `https://archipelago.dev/protocols/notes`), write 5 messages via `dwn.write-message`. On node B: add node A as a sync target (the DWN sync module uses the federation peer list), trigger `dwn.sync`. Verify all 5 messages appear on node B via `dwn.query-messages`. Write 3 messages on node B, trigger sync from node A — verify bidirectional replication. **Acceptance**: Messages replicate both ways between 2 nodes. Protocol definitions sync as well.
|
||||
|
||||
- [ ] **DWN-SYNC-02** — Test DWN sync across all 4 nodes. Register the same protocol on all 4 nodes. Write unique messages on each node (node A writes 5, B writes 3, C writes 2, D writes 4 = 14 total). Trigger sync from each node. After sync completes, query all messages on each node — every node should have all 14 messages. If sync is missing messages: check the bidirectional replication logic in `dwn_sync.rs`, ensure Tor SOCKS proxy is used correctly, check for deduplication issues. **Acceptance**: All 4 nodes have all 14 messages after sync. Message content and metadata intact.
|
||||
|
||||
- [ ] **DWN-SYNC-03** — Add DWN sync status to Federation dashboard. In `neode-ui/src/views/Federation.vue`, in the node detail modal, add a "DWN Sync" section showing: last sync time, messages synced count, sync status (idle/syncing/error), and a "Sync Now" button. Wire to `dwn.sync` RPC. In the node list, add a small DWN icon/badge showing sync state (green dot = synced recently, amber = stale, red = error). Fetch DWN status alongside federation state. **Acceptance**: Federation dashboard shows DWN sync state per node. Manual sync trigger works from the modal. Deploy and verify.
|
||||
|
||||
### Sprint 46: Node Visualization Map (July 2026 Week 1-2)
|
||||
|
||||
- [ ] **MAP-01** — Install D3.js and create network topology component. Run `cd neode-ui && npm install d3@^7 && npm install -D @types/d3@^7`. Create `neode-ui/src/components/federation/NetworkMap.vue` — a force-directed graph component using `d3-force`. Nodes are circles: size proportional to app count, color by trust level (green=trusted `#4ade80`, amber=observer `#fb923c`, red=untrusted `#ef4444`), opacity by online/offline (1.0=online, 0.4=offline). Edges are lines between federated nodes: solid green when both online, dashed gray when one offline. Add labels showing node name (truncated DID or custom alias). Use `bg-black/60 backdrop-blur-glass rounded-xl border border-white/10` container to match glassmorphism design. SVG fills the container, responsive to window resize. Add CSS classes to `neode-ui/src/style.css`. **Acceptance**: Component renders a graph with 4 test nodes (mock data). Nodes repel/attract via force simulation. Looks consistent with Archipelago glass aesthetic.
|
||||
|
||||
- [ ] **MAP-02** — Wire network map to real federation data. In `NetworkMap.vue`, accept a `nodes` prop of type `FederatedNode[]` (from `federation.list-nodes` response). Add the current node as the center node (use `node.did` RPC to get own identity). Map each node to a D3 node: id=DID, label=name or truncated DID, trust_level, online=(last_seen within 10 minutes), cpu_usage, memory_percent, app_count from `last_state`. Edges connect each peer to the local node (star topology for now). Add node tooltips on hover showing: full DID, onion address (truncated), CPU/memory/disk percentages, app count, last seen time. Click a node to open the existing node detail modal. Poll `federation.list-nodes` every 30 seconds and update the graph with smooth transitions (D3 enter/update/exit). **Acceptance**: Network map shows all real federated nodes with live data. Online/offline status updates when a server goes down. Tooltips show real metrics. Deploy and verify.
|
||||
|
||||
- [ ] **MAP-03** — Add network map as tab in Federation page. In `neode-ui/src/views/Federation.vue`, add a tab switcher at the top: "List View" (current) and "Network Map". List view shows the existing node cards. Network Map tab shows `NetworkMap.vue` taking full width of the content area. Remember selected tab in localStorage. Default to Map view when 3+ nodes are federated, List view otherwise. Add the tab styling as global classes in `style.css` following the existing tab patterns (if any) or using `.glass-tab` / `.glass-tab-active` classes with `bg-white/5` inactive and `bg-white/10 border-b-2 border-orange-400` active. **Acceptance**: Can switch between list and map views. Map shows live federation data. Tab selection persists across page navigations. Deploy and verify.
|
||||
|
||||
- [ ] **MAP-04** — Add DWN management section to Web5 page. In `neode-ui/src/views/Web5.vue`, enhance the existing DWN section with: (1) A "Manage Protocols" subsection showing registered protocols in a list with delete buttons, plus an "Add Protocol" form (URL input), (2) A "Message Store" subsection showing total message count, storage size in bytes (human-readable), and a "Browse Messages" button that opens a modal with a paginated message list (fetch via `dwn.query-messages` with limit/offset), (3) A "Sync Targets" subsection showing which peers are configured for DWN sync and their last sync status. Wire to existing `dwn.*` RPC endpoints. **Acceptance**: Can add/remove protocols, browse stored messages, and see sync status from the Web5 page. Deploy and verify.
|
||||
|
||||
### Sprint 47: Integration Testing — First Install Flow (July 2026 Week 3 — August 2026 Week 1)
|
||||
|
||||
- [ ] **INSTALL-01** — Create comprehensive first-install test script. Create `scripts/test-first-install.sh` that automates the post-install verification flow. It should: (1) Call `node.did` and verify DID format (`did:key:z...`), (2) Call `node.nostr-pubkey` and verify npub format, (3) Call `identity.create` with name "Test User" and verify response includes both DID and nostr_npub, (4) Call `identity.list` and verify the created identity has both key types, (5) Call `tor.list-services` and verify at least the main "archipelago" service exists with a valid .onion address, (6) Call `webhook.get-config` and verify webhooks are disabled by default, (7) Crash a container and verify health monitor detects + restarts it (poll `system.stats` for container count), (8) Call `dwn.status` and verify DWN is operational. Run via SSH against a target server. **Acceptance**: Script passes on 192.168.1.228 (after deploying latest code). All 8 checks green.
|
||||
|
||||
- [ ] **INSTALL-02** — Test NIP-07 signing end-to-end on live server. On 192.168.1.228: (1) Open a proxied iframe app (e.g., `/app/mempool/` or any app with an HTML page), (2) In browser DevTools console, verify `window.nostr` exists, (3) Call `window.nostr.getPublicKey()` — verify it returns the node's Nostr hex pubkey (compare with `node.nostr-pubkey` RPC response), (4) Call `window.nostr.signEvent({kind: 1, content: "test", created_at: Math.floor(Date.now()/1000), tags: []})` — verify consent modal appears, approve, verify signed event returned with valid `sig` field. Document the test steps and results. **Acceptance**: NIP-07 works in at least one iframe app. Consent modal functions. Signed events have valid Schnorr signatures.
|
||||
|
||||
- [ ] **INSTALL-03** — Test Tor rotation end-to-end on live server. On 192.168.1.228: (1) Record current node .onion address from `tor.list-services`, (2) Call `tor.rotate-service("archipelago")`, (3) Verify new .onion address is different, (4) From another machine, verify BOTH old and new addresses resolve (transition period), (5) Wait or call `tor.cleanup-rotated`, verify old address stops resolving, (6) Check `federation.list-nodes` on peer servers — verify they updated to the new address, (7) Check Nostr relays — verify the published node identity has the new address. **Acceptance**: Full rotation lifecycle works. Peers update automatically. No federation disruption.
|
||||
|
||||
- [ ] **INSTALL-04** — Run full federation + sharing + DWN integration test. Deploy latest code to all 4 servers. Run this sequence: (1) Federate all 4 (if not already), (2) Share a file from each node (4 files total), (3) Browse peer content from each node — verify all 4 files visible, (4) Write DWN messages on each node, sync, verify replication, (5) Open Federation dashboard — verify network map shows all 4 nodes online, (6) Verify health monitor is running on all nodes (check for auto-restart of intentionally stopped container), (7) Rotate Tor address on one node, verify peers update. Script the entire flow in `scripts/test-integration-full.sh`. **Acceptance**: All 7 steps pass. Script exits 0. Document any issues found and fixes applied.
|
||||
|
||||
### Sprint 48: Reliability & Uptime Hardening (August 2026 Week 2-3)
|
||||
|
||||
- [ ] **UPTIME-01** — Run 7-day continuous multi-node uptime test. Start the existing `uptime-monitor.sh` on all 4 servers (or create cron jobs). Additionally, create `scripts/federation-health-check.sh` that runs every 5 minutes: calls `federation.list-nodes` on primary, records online/offline state of each peer, records federation sync success/failure, records DWN sync state. Output to `/var/lib/archipelago/federation-health/` as CSV. Run for 7 days. **Acceptance**: After 7 days, all 4 nodes have 99%+ HTTP uptime. Federation sync success rate >95%. Zero unrecovered container crashes. Generate summary report.
|
||||
|
||||
- [ ] **UPTIME-02** — Inject failures and verify recovery. During the 7-day test, inject one failure per day across the fleet: Day 1: `sudo podman stop archy-bitcoin-knots` on node A (verify auto-restart within 60s). Day 2: `sudo systemctl restart archipelago` on node B (verify federation reconnects within 5 min). Day 3: `sudo podman stop archy-tor` on node C (verify Tor recovers, federation reconnects). Day 4: Reboot node D (`sudo reboot`), verify full recovery (crash recovery detects PID, restarts containers, federation reconnects). Day 5: Block Tor traffic with iptables on node A for 10 minutes, unblock, verify recovery. Day 6: Fill disk to 90% on node B, verify disk monitor alerts and auto-cleanup triggers. Day 7: Rotate Tor address on node C during active file sharing. Document recovery time for each scenario. **Acceptance**: All 7 injected failures recover automatically. Document recovery times. Fix any that don't recover.
|
||||
|
||||
- [ ] **UPTIME-03** — Fix any issues discovered during uptime testing. This is a catch-all task for bugs found during UPTIME-01 and UPTIME-02. For each issue: diagnose root cause, implement fix, deploy to all servers, verify fix. Common expected issues: Tor connection timeouts (increase retry), DWN sync race conditions (add locks), federation state sync conflicts (last-writer-wins), memory growth over time (check for leaks in long-running tasks). **Acceptance**: All issues found during uptime testing are resolved. Rerun the failing scenario to confirm.
|
||||
|
||||
### Sprint 49: Scale to 7 Nodes (August 2026 Week 4 — September 2026 Week 1)
|
||||
|
||||
- [ ] **SCALE-01** — Onboard servers 5, 6, and 7. When the 3 new servers are available: (1) Install Archipelago (flash ISO or manual setup), (2) Deploy latest code (adapt deploy method based on whether server has build tools), (3) Verify health on each, (4) Generate federation invite codes on primary, (5) Accept invites on new servers, (6) Verify all 7 nodes visible in `federation.list-nodes` from every node. **Acceptance**: 7 servers federated, all showing as trusted peers.
|
||||
|
||||
- [ ] **SCALE-02** — Validate Nostr discovery with 7 nodes. Publish all 7 nodes to Nostr relays. From each node, run `node.nostr-discover`. Verify each node can find all other 6. Test with multiple relay sets (remove one relay, add another) to verify redundancy. Measure: discovery latency (time from `nostr-discover` call to full list), relay query success rate. **Acceptance**: All 7 nodes discoverable from every node. Discovery completes within 30 seconds. Works with any 2 of 3 configured relays.
|
||||
|
||||
- [ ] **SCALE-03** — Test file sharing and DWN sync at 7-node scale. Share unique files from each of the 7 nodes (7 files total). From each node, browse all 6 peers' content — verify all 42 browse-peer calls succeed (7 nodes × 6 peers). Write DWN messages on all 7 nodes, sync — verify all messages replicate to all nodes. Measure total sync time for DWN messages across 7 nodes. **Acceptance**: All 42 content browsing attempts succeed. All DWN messages replicate to all 7 nodes. Document sync time.
|
||||
|
||||
- [ ] **SCALE-04** — Verify network map with 7 nodes. Open Federation dashboard on primary server. Switch to Network Map view. Verify all 7 nodes render as circles with correct trust levels, online status, and tooltips. Verify the force-directed layout handles 7 nodes cleanly (no overlapping, readable labels). Take a screenshot for documentation. Test on mobile viewport — verify the simplified view is usable. **Acceptance**: Network map displays all 7 nodes clearly with live data. No visual issues.
|
||||
|
||||
### Sprint 50: Final Polish & Release (September 2026 Week 2-4)
|
||||
|
||||
- [ ] **POLISH-01** — Run final integration test on all 7 nodes. Execute `scripts/test-integration-full.sh` adapted for 7 nodes. All checks must pass: federation, discovery, file sharing, DWN sync, health monitor, Tor rotation, NIP-07 signing. **Acceptance**: Integration test script passes on all 7 nodes.
|
||||
|
||||
- [ ] **POLISH-02** — Build release ISO with all new features. On 192.168.1.228, build new ISO via `sudo ./image-recipe/build-auto-installer-iso.sh`. The ISO must include: updated backend binary with all Sprint 40-49 changes, updated frontend with NIP-07 provider, network map, and all UI changes, updated nginx configs with NIP-07 injection, updated torrc template. Copy ISO to FileBrowser builds folder. **Acceptance**: ISO builds successfully. Copy to `/var/lib/archipelago/filebrowser/Builds/`.
|
||||
|
||||
- [ ] **POLISH-03** — Test fresh install from new ISO. Flash the ISO to a USB drive, install on a test machine (or VM). Walk through the complete first-time experience: boot → onboard (DID + npub shown, real backup, real verification) → install an app → verify NIP-07 works in iframe → verify health monitor auto-restarts crashed container → federate with an existing node → verify file sharing and DWN sync work. **Acceptance**: Complete user journey works on fresh install with zero manual intervention.
|
||||
|
||||
- [ ] **POLISH-04** — Tag v1.1.0 release. Update version in `core/archipelago/Cargo.toml` and `neode-ui/package.json` to `1.1.0`. Update CHANGELOG.md with all new features: Nostr identity in onboarding, NIP-07 iframe signing, 7-node federation tested, file sharing across nodes, DWN multi-node sync, node visualization map, health monitor fix, Tor address rotation, per-app Tor toggle. Tag `v1.1.0` in git. **Acceptance**: Tagged release with comprehensive changelog.
|
||||
|
||||
---
|
||||
|
||||
## Updated Milestone Summary
|
||||
|
||||
| Date | Milestone | Key Deliverables |
|
||||
|------|-----------|-----------------|
|
||||
| Nov 2028 | **v1.0.0** | Production release (158 tasks) |
|
||||
| Apr 2026 | Sprint 40-41 | Webhook fix, identity completion, NIP-07 signing |
|
||||
| May 2026 | Sprint 42-43 | Tor rotation, per-app toggle, 4-node federation deployed |
|
||||
| Jun 2026 | Sprint 44-45 | File sharing across nodes, DWN multi-node sync |
|
||||
| Jul 2026 | Sprint 46-47 | Node visualization map, first-install integration tests |
|
||||
| Aug 2026 | Sprint 48-49 | 7-day uptime test, scale to 7 nodes |
|
||||
| Sep 2026 | **v1.1.0** | Full feature release — all 8 features shipped and tested |
|
||||
|
||||
**Total new tasks**: 40 across 11 sprints over 6 months.
|
||||
|
||||
---
|
||||
|
||||
## Execution Instructions
|
||||
|
||||
For each task in order:
|
||||
@@ -472,4 +609,4 @@ For each task in order:
|
||||
9. Commit: `type: description`
|
||||
10. Move to the next unchecked task immediately
|
||||
|
||||
**Total tasks**: 158 completed across 39 sprints over 3 years. Plan complete as of v1.0.0 release.
|
||||
**Total tasks**: 158 completed (v1.0.0) + 40 new (v1.1.0) = 198 tasks across 50 sprints.
|
||||
|
||||
Reference in New Issue
Block a user