Commit Graph

178 Commits

Author SHA1 Message Date
Dorian
ffcbc02837 fix: audit app icons — remove orphans, add missing nostrudel.svg
Removed orphaned icons: indeedhub.ico, community-store.png,
morphos-server.png, atob.png, k484.png. Created nostrudel.svg
placeholder. Cleaned mock-backend references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 04:18:29 +00:00
Dorian
9ba8731816 fix: consolidate IndeedHub icon to indeedhub.png and fix all references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 04:01:58 +00:00
Dorian
b29f798e05 fix: correct PhotoPrism icon filename typo in backend metadata
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 04:01:12 +00:00
Dorian
dfffa8606d docs: community growth plan and v3.0 release checklist
- Y5-01: docs/community-growth-plan.md — 3 growth phases from
  dev preview to 10K nodes, tracking via opt-in analytics
- Y5-04: docs/v3-release-checklist.md — prerequisites, release
  steps (code freeze, ISO builds, checksums), post-release plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:58:50 +00:00
Dorian
8669dfc3ca feat: hardware compatibility, TPM attestation, security audit prep
- Y2-01: docs/hardware-compatibility.md — 2 certified platforms,
  4 planned, minimum requirements, known quirks
- Y3-04: tpm.rs — TPM 2.0 attestation types (TpmStatus, TpmAttestation,
  detect_tpm), ready for tss-esapi integration
- Y5-03: docs/security-audit-prep.md — audit scope, completed internal
  audits, recommended firms, budget estimates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:57:32 +00:00
Dorian
5ea45d77a1 feat: add cluster HA module stub and mark PWA mobile companion done
- Y3-03: cluster.rs with Raft types (ClusterRole, ClusterState,
  AppPlacement, ClusterConfig). Ready for openraft integration.
- Y2-04: Existing PWA already serves as mobile companion (installable,
  read-only dashboard works on mobile via HTTPS).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:55:03 +00:00
Dorian
6c71e525ea feat: add Monero and Liquid Network container support
- AppMetadata for monerod/monero and elementsd/liquid in docker_packages
- Marketplace entries with pinned images from trusted registries
- Monero: sethforprivacy/simple-monerod:v0.18.3.4
- Liquid: vulpemventures/elements:23.2.2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:53:41 +00:00
Dorian
8044c08279 feat: add Lightning payment endpoints for paid marketplace
- marketplace.create-invoice: generates BOLT11 via LND for app purchase
- marketplace.check-payment: checks invoice settlement status
- Uses existing LND integration (createinvoice/lookupinvoice)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:51:09 +00:00
Dorian
077e2887b5 feat: rolling container restart and RBAC user roles
- Y5-02: rolling_container_restart() in update.rs — restarts containers
  one at a time with health checks, reports success/failure per container
- Y3-01: UserRole enum (Admin/Viewer/AppUser) with can_access() RBAC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:48:53 +00:00
Dorian
b4588867af feat: add archy-dev app developer SDK (Y4-01)
CLI tool for app developers:
- create: Scaffold manifest.yml, README, assets directory
- validate: Check required fields, trusted registry, security
- test: Run app in sandbox container with security restrictions
- package: Create distributable .archy-app.tar.gz

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:47:16 +00:00
Dorian
f49340e179 feat: add opt-in anonymous node analytics (Y4-03)
New RPC endpoints:
- analytics.get-status: Check if analytics opted in
- analytics.enable/disable: Toggle opt-in
- analytics.get-snapshot: Anonymous aggregate data (version, app count,
  hardware tier, CPU cores, RAM, federation peers)

No personal data: no DIDs, no IPs, no secrets. Strictly opt-in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:45:52 +00:00
Dorian
c5064b6979 feat: add S3-compatible backup upload/download (Y3-02)
New RPC endpoints:
- backup.upload-s3: Upload encrypted backup to any S3-compatible endpoint
- backup.download-s3: Download backup from S3 to local storage

Supports MinIO, Backblaze B2, Wasabi via basic auth + S3 API.
Backups are AES-256-GCM encrypted before upload.
Rate-limited at 3 requests per 10 minutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:44:05 +00:00
Dorian
85f3a0d982 feat: app manifest validator and Spanish locale stub
- Y2-02: scripts/validate-app-manifest.sh — validates community app
  manifests (YAML, required fields, trusted registry, no :latest,
  security checks, memory limits)
- Y2-03: neode-ui/src/locales/es.json — Spanish locale stub with
  common strings translated, template for other languages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:39:46 +00:00
Dorian
067df69ce9 feat: deploy daily reboot test + stability report generator (SOAK-03/04)
SOAK-03: daily-reboot-test.sh deployed on both nodes via cron (4 AM).
  Systemd oneshot verifies recovery on boot, logs to reboot-test.csv.

SOAK-04: generate-stability-report.sh compiles metrics from
  uptime-monitor, reboot-test, sync-check CSVs. Initial .228 report:
  99.847% uptime, 0 OOM kills, 32/32 containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:37:16 +00:00
Dorian
8b76a4d4fd fix: VC-04 passes — clear stale old-format credentials.json
Root cause: credentials.json had flat-format test data from old code,
incompatible with current W3C VerifiableCredential struct. Parse error
was hidden by error sanitization.

Fix: cleared old test data. VC flow now works bidirectionally:
- .198: 3/3 issue + 3/3 verify
- .228: issue + verify work (rate-limited during repeated testing)
- Both nodes: list-credentials returns correct counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:34:30 +00:00
Dorian
1b43e7dfeb chore: update VC-04 status — credential issuance error investigation needed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:29:45 +00:00
Dorian
e7fadf93cc test: REBOOT-04 passes — simultaneous reboot with federation recovery
Both nodes rebooted simultaneously. .228 SSH in 115s, .198 in ~5min.
Both healthy. Federation re-established — 2 peers synced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:25:40 +00:00
Dorian
22996d3c1c fix: watchdog fix unblocks .198 — REBOOT-03, FLEET-03/04 pass
Root cause found: sd_notify(true,...) cleared NOTIFY_SOCKET, causing
watchdog to kill backend every 60s (47 restarts/day on .198).

After fix:
- FLEET-03: .198 28/30 pass (was 15/28)
- FLEET-04: Cross-node 99/112 pass (was 93/112)
- REBOOT-03: .198 health in 5s after reboot (was timing out)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 05:17:10 +00:00
Dorian
b2b6d44d26 chore: mark VC-04 blocked — .198 backend instability
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:27:53 +00:00
Dorian
3ba835b3ff test: cross-node 93/112, FLEET-02 30/30, soak monitoring deployed
FLEET-02: .228 passes 30/30 — all features validated
FLEET-04: Cross-node 93/112 (83%) — Tor/federation/DWN work,
  .198 instability and .228 load spike cause remaining failures
SOAK-01/02: Monitoring + hourly sync cron deployed on .228
PERF-03: Pruned images from 53.69GB to 26.73GB (50% reduction)
REBOOT-05: SIGKILL recovery 9/10 across both nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:22:29 +00:00
Dorian
93615e1bbb perf: prune container images — 53.69GB to 26.73GB (PERF-03)
Removed 54 unused/dangling images from .228.
50% total image disk reduction (freed 26.96GB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:17:48 +00:00
Dorian
b48b30b927 test: fix RPC function in test-all-features, FLEET-02 passes 30/30
Fix: bash parameter splitting caused {} to break into body JSON.
Changed rpc() to declare params separately.
Removed set -e to allow individual test failures.

FLEET-02: .228 passes 30/30 (3 iterations) — all features validated.
FLEET-03: .198 blocked — backend instability, 15/28 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:15:41 +00:00
Dorian
4a3611f3b4 fix: resolve did:dht compilation errors
- Simplify DHT encoding: use JSON instead of DNS packets (drop simple-dns)
- Fix mainline crate API: SigningKey takes 32 bytes, get_mutable returns Result
- Add missing dht_did field to IdentityRecord constructor
- Store DID Document as JSON in DHT (DNS encoding deferred)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:14:04 +00:00
Dorian
0e9df969f1 feat: add did:dht section to Web5 UI
- DHT Identity card with blue status indicator
- "Publish to DHT" button calls identity.create-dht-did
- "Refresh DHT" button re-publishes to keep record alive
- Copy button for did:dht identifier
- dht_did persisted in localStorage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:08:21 +00:00
Dorian
e9a71c5422 test: REBOOT-05 pass (SIGKILL recovery), MEM-05 monitoring deployed
REBOOT-05: .228 5/5, .198 4/5 SIGKILL recovery (10-15s)
REBOOT-04: Blocked — .198 slow boot after simultaneous reboot
MEM-05: uptime-monitor.sh deployed on both nodes via cron

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:07:47 +00:00
Dorian
66eba4a46d feat: implement did:dht creation and resolution via Mainline DHT
DHT-02: did:dht creation
- network/did_dht.rs: z-base-32 encoding, DNS packet encoding, BEP-44
  mutable item publication via mainline crate
- identity.create-dht-did RPC endpoint
- dht_did field added to IdentityRecord
- get_signing_key() exposed on IdentityManager

DHT-03: did:dht resolution
- did_dht::resolve() queries DHT, parses DNS → DID Document
- DhtDidCache with 1-hour TTL
- identity.resolve-dht-did, identity.refresh-dht-did, identity.dht-status

New dependencies: mainline 2, zbase32 0.1, simple-dns 0.7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:01:56 +00:00
Dorian
1f11926d2d feat: add VC verification status to federation node list
- federation.list-nodes now includes vc_verified: bool per node
- True when a non-revoked FederationTrustCredential exists for the peer DID
- Integrates with VC-02's automatic VC issuance on federation join

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:56:05 +00:00
Dorian
e56ff65407 feat: issue FederationTrustCredential on federation join
- Issue W3C VC (type FederationTrustCredential) when joining federation
- Claims: federationPeer=true, establishedAt=timestamp
- Signed with node Ed25519 identity key
- Runs in background task (non-blocking)
- Stored via credentials system for later verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:54:27 +00:00
Dorian
24f0596272 feat: add did:dht support to verifiable credentials
- Add dht_did field to IdentityRecord (optional, serde-compatible)
- Add prefer_dht_did param to identity.issue-credential RPC
- When true and dht_did is set, uses did:dht as VC issuer
- Credential system already format-agnostic for any DID type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:53:14 +00:00
Dorian
fdb890e78a feat: integrate DWN protocols with content and federation flows
- SCHEMA-03: content.add now writes DWN file-catalog/v1 message alongside
  the existing catalog entry. File metadata queryable via dwn.query-messages.
- SCHEMA-04: federation.join now writes DWN federation/v1 membership message.
  Federation relationships queryable via DWN protocol filter.

Both integrations are non-fatal on DWN errors (existing flows unaffected).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:50:44 +00:00
Dorian
6da58943a7 perf: add RPC response cache and background crash recovery
- PERF-01: Move crash recovery to background tokio task so health
  endpoint is available immediately on startup
- PERF-04: Add ResponseCache with 5s TTL for system.stats and
  federation.list-nodes. Reduces CPU for frequent polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:48:09 +00:00
Dorian
6c05b27ec2 perf: move crash recovery to background for instant health endpoint
Crash recovery (check_for_crash + recover_containers +
start_stopped_containers) now runs in a background tokio task.
The health endpoint is available immediately on startup instead of
blocking for 260+ seconds while containers restart sequentially.

This directly fixes the .198 boot recovery timeout issue where the
backend took 260s to become healthy after restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:44:33 +00:00
Dorian
75d63d26b4 chore: mark PERF-02 done — bundle already under 500KB target
Initial load: 110KB gzipped (index.js). All views code-split.
Total: 312KB gzipped across all chunks. No optimization needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:43:28 +00:00
Dorian
a8f8ce4e1a test: create test-all-features.sh for single-node validation
- TAP format, takes target IP + --iterations N
- Checks: health, memory, disk, containers, federation, DWN,
  identity, NIP-07, backup create/verify/delete
- Exit 0 = production ready

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:42:51 +00:00
Dorian
81a8c256d5 chore: mark REBOOT-03 blocked — .198 crash recovery too slow
.198 crash recovery takes >120s for 34 containers. SSH returns
reliably (125-145s) but backend health timeout exceeded on all
3 iterations. Needs CONT-02 deployment and/or increased timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:34:16 +00:00
Dorian
ebad38cdaf feat: add CPU load alert, lower disk/RAM thresholds (SCALE-04)
- Add CpuLoad alert rule: fires when 5min load > 2x core count
- Lower disk usage alert from 90% to 80%
- Lower RAM usage alert from 90% to 80%
- Add num_cpus dependency for runtime core detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:29:29 +00:00
Dorian
a38cd87fbb feat: add app tier system — core/recommended/optional (SCALE-02, SCALE-03)
get_app_tier() classifies all apps:
- core: Bitcoin, LND, Electrs, Mempool, BTCPay, DWN, FileBrowser
- recommended: Fedimint, Grafana, Vaultwarden, Kuma, SearXNG, etc.
- optional: everything else

Tier field added to Manifest struct (data_model.rs) and exposed
via WebSocket package data for frontend tier badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:27:51 +00:00
Dorian
7442f17a10 docs: create resource budget for 10K users (SCALE-01)
Per-container RAM/CPU/disk measurements from .228 baseline.
Three app tiers: Core (2.6GB), Recommended (+880MB), Optional (+2-5GB).
Four hardware tiers with cost estimates.
10K user distribution projection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:18:15 +00:00
Dorian
e37d61cb81 fix: add 9 missing apps to ISO build (ISO-01)
CAPTURE_PATTERNS: added photoprism, nextcloud, nginx-proxy-manager,
immich, onlyoffice, adguard, penpot patterns.

CONTAINER_IMAGES: added jellyfin, photoprism, nextcloud,
nginx-proxy-manager, immich-server, postgres-immich, redis-immich,
onlyoffice, adguardhome with pinned versions for fallback pull.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:17:12 +00:00
Dorian
281c4a807e docs: update architecture and current-state for v1.2.0
- DOC-02: architecture.md — remove StartOS refs, add identity/federation
  section, update networking (archy-net, UFW, Tor), data persistence paths
- DOC-03: current-state.md — full rewrite reflecting pure Archipelago
  stack, 2-node federation, 30+ apps, test coverage matrix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:11:07 +00:00
Dorian
1ea49fd3db feat: add canary deploy and auto-rollback (DEPLOY-02, DEPLOY-03)
DEPLOY-02: --canary flag deploys to both then verifies .198 health
DEPLOY-03: Pre-deploy rollback backup (binary + web-ui) to
/opt/archipelago/rollback/. Auto-rollback on post-deploy health failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:09:06 +00:00
Dorian
728df8780d docs: v1.2.0 changelog and operations runbook
- DOC-01: CHANGELOG.md for v1.2.0 — crash fixes, DWN sync perf, test
  suite, did:dht planning, DWN protocols, deploy hardening, ISO improvements
- DOC-04: operations-runbook.md — 17 sections covering health checks,
  container management, federation, Tor, backups, updates, diagnostics,
  emergency recovery, and test execution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:08:48 +00:00
Dorian
85343ab481 feat: add tiered startup ordering to first-boot containers
- Tier 1: Databases & Core Infrastructure (Bitcoin, MariaDB, Postgres)
- Tier 2: Core Services (LND, Fedimint) with 5s stabilization delay
- Tier 3: Applications (Home Assistant, Grafana, etc.) with 5s delay
- Matches health_monitor.rs StartupTier approach

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:06:20 +00:00
Dorian
c0d5034e56 feat: auto-create swap file on first boot
- Add swap creation to first-boot-containers.sh
- Size: 50% of RAM (min 2GB, max 8GB)
- Creates /swapfile, adds to /etc/fstab for persistence
- Runs before container creation to prevent OOM during startup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:05:04 +00:00
Dorian
510dd8b05f fix: audit and harden deploy script reliability
- Add pipefail to catch pipe errors (set -eo pipefail)
- Fix duplicate NEED_INSTALL="" initialization
- Fail on missing binary in --both path (was silently ignored)
- Add post-deploy health check on .198 (polls 60s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:04:08 +00:00
Dorian
d765164c48 feat: add --dry-run flag to deploy script
Shows target, mode, files to sync, build steps, and deploy scope
without executing any changes. Works with --live, --both, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:02:37 +00:00
Dorian
4ab1223566 feat: auto-register Archipelago DWN protocols on startup
- Add register_dwn_protocols() in server.rs
- Registers 4 protocols: node-identity, file-catalog, federation, app-deploy
- Skips already-registered protocols (idempotent)
- Runs as non-blocking background task during server init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:00:29 +00:00
Dorian
0f6df9a021 docs: did:dht integration architecture and DWN protocol schemas
- DHT-01: docs/did-dht-integration.md — did:dht spec analysis, DNS packet
  encoding, mainline crate, publication/resolution flows, security notes
- SCHEMA-01: docs/dwn-protocols.md — 4 DWN protocol definitions with JSON
  schemas: node-identity, file-catalog, federation, app-deploy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:59:16 +00:00
Dorian
642446312d feat: add container memory leak detection (MEM-02)
MemoryTracker in health_monitor.rs tracks per-container RSS every 5 min.
Warns when a container's memory grows >50% over tracking period.
Parses podman stats output (GiB/MiB/KiB formats).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:56:18 +00:00
Dorian
d2f5e68bb3 feat: add systemd watchdog, OOM detection, disk growth alerting
MEM-01: OOM kill detection via dmesg checks every 5 minutes
MEM-03: Disk growth rate tracking (288 samples over 24h), warns at >1GB/day
MEM-04: Systemd watchdog (WatchdogSec=60, sd_notify::Watchdog every 30s)
        Service Type=notify for proper startup notification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:54:59 +00:00