Commit Graph

102 Commits

Author SHA1 Message Date
Dorian
16b389dda1 fix: watchdog killing backend every 60s on .198 (47 restarts/day)
Root cause: sd_notify::notify(true, ...) cleared NOTIFY_SOCKET env var,
so watchdog pings never reached systemd. Backend killed every 60s.

Fixes:
- Change sd_notify::notify first param to false (keep socket)
- Increase WatchdogSec from 60 to 300 (5min) for crash recovery
- Add TimeoutStartSec=300 for slow container startups
- Adjust watchdog ping interval to 120s

This was causing 47 restarts/day on .198 and blocking REBOOT-03,
FLEET-03, FLEET-04, VC-04.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:30:57 +00:00
Dorian
aabe28fc98 fix: add bytes crate for mainline DHT Bytes type
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:18:32 +00:00
Dorian
dc48d6fc8c fix: use correct mainline v2 API for DHT operations
- get_mutable takes &[u8; 32] directly (not VerifyingKey)
- MutableItem::new takes bytes::Bytes (not Vec<u8>)
- Remove VerifyingKey import (not exported from mainline v2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:17:05 +00:00
Dorian
4a3611f3b4 fix: resolve did:dht compilation errors
- Simplify DHT encoding: use JSON instead of DNS packets (drop simple-dns)
- Fix mainline crate API: SigningKey takes 32 bytes, get_mutable returns Result
- Add missing dht_did field to IdentityRecord constructor
- Store DID Document as JSON in DHT (DNS encoding deferred)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:14:04 +00:00
Dorian
66eba4a46d feat: implement did:dht creation and resolution via Mainline DHT
DHT-02: did:dht creation
- network/did_dht.rs: z-base-32 encoding, DNS packet encoding, BEP-44
  mutable item publication via mainline crate
- identity.create-dht-did RPC endpoint
- dht_did field added to IdentityRecord
- get_signing_key() exposed on IdentityManager

DHT-03: did:dht resolution
- did_dht::resolve() queries DHT, parses DNS → DID Document
- DhtDidCache with 1-hour TTL
- identity.resolve-dht-did, identity.refresh-dht-did, identity.dht-status

New dependencies: mainline 2, zbase32 0.1, simple-dns 0.7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:01:56 +00:00
Dorian
1f11926d2d feat: add VC verification status to federation node list
- federation.list-nodes now includes vc_verified: bool per node
- True when a non-revoked FederationTrustCredential exists for the peer DID
- Integrates with VC-02's automatic VC issuance on federation join

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:56:05 +00:00
Dorian
e56ff65407 feat: issue FederationTrustCredential on federation join
- Issue W3C VC (type FederationTrustCredential) when joining federation
- Claims: federationPeer=true, establishedAt=timestamp
- Signed with node Ed25519 identity key
- Runs in background task (non-blocking)
- Stored via credentials system for later verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:54:27 +00:00
Dorian
24f0596272 feat: add did:dht support to verifiable credentials
- Add dht_did field to IdentityRecord (optional, serde-compatible)
- Add prefer_dht_did param to identity.issue-credential RPC
- When true and dht_did is set, uses did:dht as VC issuer
- Credential system already format-agnostic for any DID type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:53:14 +00:00
Dorian
fdb890e78a feat: integrate DWN protocols with content and federation flows
- SCHEMA-03: content.add now writes DWN file-catalog/v1 message alongside
  the existing catalog entry. File metadata queryable via dwn.query-messages.
- SCHEMA-04: federation.join now writes DWN federation/v1 membership message.
  Federation relationships queryable via DWN protocol filter.

Both integrations are non-fatal on DWN errors (existing flows unaffected).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:50:44 +00:00
Dorian
6da58943a7 perf: add RPC response cache and background crash recovery
- PERF-01: Move crash recovery to background tokio task so health
  endpoint is available immediately on startup
- PERF-04: Add ResponseCache with 5s TTL for system.stats and
  federation.list-nodes. Reduces CPU for frequent polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:48:09 +00:00
Dorian
6c05b27ec2 perf: move crash recovery to background for instant health endpoint
Crash recovery (check_for_crash + recover_containers +
start_stopped_containers) now runs in a background tokio task.
The health endpoint is available immediately on startup instead of
blocking for 260+ seconds while containers restart sequentially.

This directly fixes the .198 boot recovery timeout issue where the
backend took 260s to become healthy after restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:44:33 +00:00
Dorian
f608523e3d fix: restore get_app_tier function signature
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:39:17 +00:00
Dorian
49b7c400c1 fix: remove duplicate tier fields in AppMetadata
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:37:51 +00:00
Dorian
176336b555 fix: add missing tier field to all AppMetadata, fix build errors
- Add tier: "" to all AppMetadata match arms (was missing from 30+ arms)
- Use std::thread::available_parallelism() instead of num_cpus crate
- Remove unused num_cpus dependency
- Fix unused variable warning in health_monitor.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:36:44 +00:00
Dorian
ebad38cdaf feat: add CPU load alert, lower disk/RAM thresholds (SCALE-04)
- Add CpuLoad alert rule: fires when 5min load > 2x core count
- Lower disk usage alert from 90% to 80%
- Lower RAM usage alert from 90% to 80%
- Add num_cpus dependency for runtime core detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:29:29 +00:00
Dorian
a38cd87fbb feat: add app tier system — core/recommended/optional (SCALE-02, SCALE-03)
get_app_tier() classifies all apps:
- core: Bitcoin, LND, Electrs, Mempool, BTCPay, DWN, FileBrowser
- recommended: Fedimint, Grafana, Vaultwarden, Kuma, SearXNG, etc.
- optional: everything else

Tier field added to Manifest struct (data_model.rs) and exposed
via WebSocket package data for frontend tier badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:27:51 +00:00
Dorian
4ab1223566 feat: auto-register Archipelago DWN protocols on startup
- Add register_dwn_protocols() in server.rs
- Registers 4 protocols: node-identity, file-catalog, federation, app-deploy
- Skips already-registered protocols (idempotent)
- Runs as non-blocking background task during server init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:00:29 +00:00
Dorian
642446312d feat: add container memory leak detection (MEM-02)
MemoryTracker in health_monitor.rs tracks per-container RSS every 5 min.
Warns when a container's memory grows >50% over tracking period.
Parses podman stats output (GiB/MiB/KiB formats).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:56:18 +00:00
Dorian
d2f5e68bb3 feat: add systemd watchdog, OOM detection, disk growth alerting
MEM-01: OOM kill detection via dmesg checks every 5 minutes
MEM-03: Disk growth rate tracking (288 samples over 24h), warns at >1GB/day
MEM-04: Systemd watchdog (WatchdogSec=60, sd_notify::Watchdog every 30s)
        Service Type=notify for proper startup notification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:54:59 +00:00
Dorian
6335ea17ee feat: Phase 4 backend hardening — container reliability + security audit
Container Management (CONT-01 through CONT-06):
- Fix needs_archy_net: add lnd, nbxplorer to archy-net list
- Add StartupTier dependency ordering to health monitor (DB→Core→Dependent→App→UI)
- Add exponential backoff (10s/30s/90s) with 1hr stability reset
- Add get_health_check_args() with health checks for 20+ apps
- Add get_memory_limit() with per-app limits (128m-4g vs blanket 2g)
- Create docs/network-topology.md
- Fix fedimint containers on both nodes (moved to archy-net)

Security Audit (SEC-01 through SEC-06):
- Add sanitize_error_message() — strips internal paths from RPC errors
- Add validate_identity_id() — blocks path traversal on identity operations
- Add validate_did() — blocks path traversal on federation operations
- Add message size limits: node-send-message (1MB), dwn.write-message (10MB)
- Add rate limits for federation endpoints (join: 5/60s, invite: 10/300s)
- Configure journald (500MB max, 7 day retention) on both nodes
- Add /etc/logrotate.d/archipelago for backend + crowdsec logs
- Verify all 4 nginx security headers on both nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:45:28 +00:00
Dorian
f9a47a2602 test: US-10 backup/restore tests pass 80/80 — add rate limit headroom
- Add US-10 backup/restore test section to test-cross-node.sh
- Test cycle: create → list → verify → delete, 10 iterations × 2 nodes
- Increase backup.create rate limit from 3/600 to 10/600 (still conservative)
- Increase backup.restore rate limit from 2/600 to 5/600
- Clean up 21K+ stale DWN test messages on both servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:11:24 +00:00
Dorian
65b5d5db8e test: US-08 DWN sync tests pass 50/50 — fix sync performance
- Make dwn.sync endpoint async: spawns background task, returns immediately
- Add 90s overall timeout to sync_with_peers via tokio::time::timeout
- Deduplicate peer onion addresses before syncing
- Batch message pushes (50 per request) instead of one-at-a-time over Tor
- Add 15s connect_timeout to Tor SOCKS5 client
- Cap local message query to 200 messages per sync
- Fix DWN HTTP handler to process ALL messages in batch (was only first)
- Add recordId deduplication in handler to prevent duplicate imports
- Update test script to poll dwn.status for sync completion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 01:35:56 +00:00
Dorian
ac7bf8c62b feat: release v1.1.0 — Nostr signing, file sharing, DWN sync, Tor rotation
Bump version to 1.1.0 in Cargo.toml and package.json.
Add comprehensive CHANGELOG.md entry covering all v1.1.0 features:
NIP-07 iframe signing, file sharing across nodes, DWN multi-node sync,
node visualization map, Tor address rotation, boot container recovery,
and full monitoring/testing suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 04:02:21 +00:00
Dorian
3e121b525f feat: auto-start stopped containers on boot, add failure recovery tests
Added start_stopped_containers() to crash_recovery.rs that starts all
exited/created containers on backend startup, fixing the issue where
containers didn't come back after clean reboot (PID marker removed by
systemd stop). Created test-failure-recovery.sh covering 5 failure
scenarios: container crash, backend restart, Tor restart, full reboot,
and Tor traffic block (UPTIME-02).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:55:14 +00:00
Dorian
a98529868e feat: fix Tor rotation to handle system Tor and hostname caching
read_onion_address() now checks tor-hostnames readable cache first,
clears cache before wait_for_hostname, updates it after rotation.
Rotation restarts system Tor (not just archy-tor container). Created
test-tor-rotation.sh with 10 automated checks (INSTALL-03).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:32:21 +00:00
Dorian
1ac6034457 feat: fix NIP-07 signing to use node Nostr key, add test script
Added node.nostr-sign RPC that uses the node-level Nostr key (matching
getPublicKey), fixing pubkey mismatch where identity.nostr-sign used a
different key. Updated appLauncher to call node.nostr-sign. Added
nostr_sign_hash() to nostr_discovery.rs. Created test-nip07.sh with
11 automated checks (INSTALL-02).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:18:45 +00:00
Dorian
3eca0cb6c7 feat: fix DWN sync to use federation peers and standard port
- DWN sync now uses federation node list instead of old peer list
- Fix sync URL to use port 80 (nginx) instead of 5678 (direct backend)
- DWN /dwn endpoint now accessible without auth for peer sync
- Support both message formats: {message:{}} and {messages:[{}]}
- Replace request["message"] with unified message variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:47:09 +00:00
Dorian
2e20984686 feat: add Peer Files UI for browsing and downloading federated content
- New PeerFiles.vue view shows federated peers and their shared catalogs
- Peer Files card in Cloud.vue shows when federation peers exist
- New content.download-peer RPC fetches content from peer via Tor
- Route: /dashboard/cloud/peers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:37:59 +00:00
Dorian
92ac73fc20 feat: implement peers_only and specific availability access control for content
- PeersOnly access now checks X-Federation-DID header against known federation nodes
- Specific availability restricts content to named peer DIDs only
- Anonymous/unknown DID requests get 403 Forbidden
- Free content remains accessible to everyone
- Paid content still returns 402 with price info

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:27:38 +00:00
Dorian
aa733a7daa feat: fix content sharing — nginx proxy, file path resolution, catalog filtering
- Add /content and /dwn proxy locations to nginx config (both HTTP and HTTPS)
  so peer requests reach the backend instead of the SPA catch-all
- Update content_file_path() to check FileBrowser data dir as fallback when
  files aren't in the dedicated content/files/ directory
- Populate size_bytes from actual file metadata in content.add
- Filter out availability:nobody items from the public catalog endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:20:55 +00:00
Dorian
c45f0c8fb8 feat: federate 3 servers with Tor, fix inter-node auth (FED-DEPLOY-02)
- Add tor-hostnames fallback for reading onion addresses when system Tor
  owns hidden_service directories (permissions 700)
- Exempt federation.peer-joined, federation.get-state, and
  federation.peer-address-changed from auth/CSRF (inter-node RPC)
- Set up system Tor with AppArmor overrides on archipelago-2 and 3
- All 3 servers federated and syncing successfully

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 01:52:50 +00:00
Dorian
ccaeb10a92 feat: propagate Tor address rotation to Nostr relays and federation peers
After rotation, spawns background task that publishes updated .onion to
Nostr relays and sends federation.peer-address-changed RPC to all peers
over Tor. Peers update their nodes.json with the new address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:08:16 +00:00
Dorian
fe2934a917 feat: add Tor address rotation, cleanup, and per-app toggle RPC endpoints
tor.rotate-service: renames hidden service dir, restarts Tor, waits
for new hostname. Old dir kept for 24h transition.
tor.cleanup-rotated: removes expired old service directories.
tor.toggle-app: enable/disable Tor access per app with service dir
management and container restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:57:38 +00:00
Dorian
3383b43a75 feat: add NIP-04 and NIP-44 encrypt/decrypt RPC endpoints for iframe apps
Backend: identity.nostr-encrypt-nip04, identity.nostr-decrypt-nip04,
identity.nostr-encrypt-nip44, identity.nostr-decrypt-nip44 endpoints
with auto-resolve to default identity. Frontend: appLauncher routes
nip04.* and nip44.* postMessage calls to backend RPC.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:50:56 +00:00
Dorian
e1d723b24e feat: auto-generate Nostr keypair during identity creation
Every new identity now gets both Ed25519 (DID) and secp256k1 (Nostr)
keys from creation. The create() method calls create_nostr_key()
automatically, so identity.create RPC always returns nostr_pubkey
and nostr_npub fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:41:21 +00:00
Dorian
701b202b41 fix: add webhook delivery for monitoring alerts
DiskUsage and ContainerCrash alerts now fire webhooks via
send_webhook() after pushing WebSocket notifications. Added
data_dir parameter to spawn_metrics_collector for webhook config
access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:32:19 +00:00
Dorian
a227ca8c32 fix: decouple health monitor from webhook config
Health checks, auto-restarts, and WebSocket notifications now run
unconditionally. Previously the entire health loop was gated on
webhook config, so fresh installs (webhooks disabled) got zero
container monitoring. Webhook HTTP delivery is now fire-and-forget
after the notification is pushed to the UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:26:58 +00:00
Dorian
5e6aaa74aa patches on sxsw ai working api key working container hardened plus many more 2026-03-12 22:19:04 +00:00
Dorian
73e0a1b74d hot fixes to utc-6 2026-03-12 12:56:59 +00:00
Dorian
f07ce10b1a refactor: update dependencies and remove unused code
- Added new dependencies: `adler2`, `crc32fast`, `flate2`, `miniz_oxide`, and `libredox`.
- Updated existing dependencies: `tokio-rustls` to version 0.26.4 and `filetime` to version 0.2.27.
- Removed the `backup.rs` file as it is no longer needed.
- Introduced tests for configuration and credential management.
- Enhanced the `identity` module to generate W3C compliant DID documents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 00:19:30 +00:00
Dorian
fd2a837bea fix: enforce no-new-privileges on all container creation
The manifest field was validated but never applied to the podman create
command. Now passes --security-opt no-new-privileges=true for all containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:06:59 +00:00
Dorian
1505b1b1cc fix: monthly security scan — fix shell injection and add RPC body limit (MAINT-02)
- Replace sh -c echo with tokio::fs::write for bitcoin.conf generation
- Add client_max_body_size 1m to /rpc/ in both HTTP and HTTPS nginx blocks
- Document full audit findings in docs/security-audit-2026-03-11.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:09:16 +00:00
Dorian
4995dc2656 feat: add per-endpoint rate limiting for sensitive operations (PENTEST-04)
New EndpointRateLimiter in session.rs tracks requests per (method, IP)
with configurable limits and time windows:

Financial operations (5 req/5min):
- wallet.send, lnd.sendcoins, lnd.payinvoice, lnd.create-psbt,
  lnd.finalize-psbt, wallet.ecash-send

Channel operations (3 req/5min):
- lnd.openchannel, lnd.closechannel

Backup operations (2-3 req/10min):
- backup.create, backup.restore

Container/package installs (5 req/5min):
- container-install, package.install

System operations (2 req/5min):
- system.reboot, system.shutdown, update.apply

Identity/auth (3-10 req/5min):
- identity.create, identity.issue-credential, auth.changePassword

Returns HTTP 429 with Retry-After header when limits exceeded.
Verified on live server: auth.changePassword blocks at 4th request,
lnd.sendcoins blocks at 6th request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 14:46:25 +00:00
Dorian
89acc3ed5c fix: harden input validation across all RPC endpoints (PENTEST-02)
Manual security audit of 130+ RPC endpoints. Critical fixes:
- LND: validate pubkey (66-char hex), Bitcoin addresses, channel points,
  amount bounds, payment request format, memo length, peer address
- Package: validate_app_id on start/stop/restart/bundled-app handlers,
  validate volume host paths (must be under /var/lib/archipelago/),
  validate Docker image in bundled-app-start
- Container: validate_app_id on all 6 handlers, canonicalize manifest paths
- Network: path traversal prevention in connection request deletion
- Backup: backup ID validation in delete handler
- Webhooks: URL scheme validation, SSRF prevention for private IPs
- Security: validate app_id in secret rotation
- Interfaces: WiFi password length/null validation, strict IP/gateway/DNS
  parsing for static ethernet config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 14:32:49 +00:00
Dorian
224681f1e0 feat: add webhook notification system with Settings UI (REMOTE-03)
Webhook module with HTTP delivery, HMAC-SHA256 signing, and event
filtering. RPC handlers for get-config, configure, and test endpoints.
Settings page gains webhook configuration section with URL, secret,
event toggles, and test button.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 12:55:13 +00:00
Dorian
8ffa89ba16 feat: add Tailscale remote access setup RPC (REMOTE-01)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 12:42:05 +00:00
Dorian
980fc3af6d feat: add metrics export as CSV/JSON (MON-04)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 12:33:19 +00:00
Dorian
1b8a8cfd32 feat: add alerting system with configurable rules and UI (MON-02, MON-03)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 12:28:44 +00:00
Dorian
592548066e feat: add real-time metrics collection with ring buffer storage (MON-01)
Implements monitoring/collector.rs that collects per-container CPU/RAM/network/disk,
system-wide metrics, RPC latency, and WebSocket connection count every 60 seconds.
Data stored in dual ring buffers: 1-min resolution (24h) and 15-min resolution (7d).
Three new RPC endpoints: monitoring.current, monitoring.history, monitoring.containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 11:11:02 +00:00
Dorian
45032d937b feat: add community infrastructure and update server setup
- releases/manifest.json: Seed release manifest for update server
- update.rs: Make UPDATE_MANIFEST_URL configurable via ARCHIPELAGO_UPDATE_URL env var
- CONTRIBUTING.md: Comprehensive contribution guidelines covering code style,
  PR process, testing, security disclosure, and app submission
- .github/ISSUE_TEMPLATE/: Bug report, feature request, and app submission
  issue templates with structured forms
- .github/pull_request_template.md: PR template with checklist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 10:57:33 +00:00