fix: overhaul container lifecycle — recovery, health, uninstall, UI state
Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
357
.claude/plans/mutable-roaming-pancake.md
Normal file
357
.claude/plans/mutable-roaming-pancake.md
Normal file
@@ -0,0 +1,357 @@
|
||||
# Gold Standard Claude Code Configuration — Archipelago
|
||||
|
||||
## Context
|
||||
|
||||
The last optimization (2026-03-28) cut CLAUDE.md from 130→101 lines and skills from 33→11. That was the right first pass. This plan is the second pass: fixing structural issues the first cleanup didn't address — hook duplication, memory chaos, a leaked API key, missing path scoping, context budget waste, and underutilized agent/permission systems. The goal is a configuration so tight that re-running this audit would produce zero suggestions.
|
||||
|
||||
**Research base**: Every file in `.claude/` (project + global), all 26 project memories, all 8 auto-memories, all 11 skills, all 5 rules, all 11 hooks, both settings files, the iframe-specialist agent, the full project structure (core/, neode-ui/, scripts/, image-recipe/, apps/, .gitea/), latest Claude Code docs (CLAUDE.md best practices, hooks v2.1.85+, skills frontmatter, agents, memory, permissions, MCP, context management, agent teams), and the 2026-03-28 cleanup feedback.
|
||||
|
||||
**Governing principle** (carried from cleanup): *Every line must prevent a specific mistake Claude would otherwise make. If Claude does it right without the instruction, it's noise.*
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: CRITICAL — Remove Leaked Secret
|
||||
|
||||
**File**: `.claude/memory/deploy-automation.md` (line 11)
|
||||
Contains a plaintext Anthropic API key: `sk-ant-api03-...`
|
||||
|
||||
**Action**: Remove the key immediately. Replace with: `"ANTHROPIC_API_KEY from secrets store (never stored in memory files)"`
|
||||
|
||||
This is the only blocking item. Everything else is optimization.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: CLAUDE.md — Trim to ~75 Lines
|
||||
|
||||
**File**: `/Users/dorian/Projects/archy/CLAUDE.md`
|
||||
**Current**: 101 lines | **Target**: ~75 lines | **Saves**: ~500 tokens/session
|
||||
|
||||
### What to cut (reference data that doesn't prevent mistakes)
|
||||
|
||||
| Section | Lines | Action | Reason |
|
||||
|---------|-------|--------|--------|
|
||||
| Infrastructure table | 21-30 | Move to auto-memory | Reference data, not a rule. Already in memory files |
|
||||
| ISO debug commands | 79-84 | Move to `iso-debug` skill reference | Diagnostic commands, not rules |
|
||||
| Kiosk toggle info | 85-86 | Move to auto-memory or delete | Reference, not a rule |
|
||||
| "Backend binds 127.0.0.1" | 63 | Move to new backend rule | Claude can read the code |
|
||||
| "Timeouts on all external operations" | 65 | Move to new backend rule | Already in `rules/api.md` |
|
||||
|
||||
### What to add
|
||||
|
||||
```markdown
|
||||
## Compact Instructions
|
||||
When compacting, preserve: list of modified files, test results, deploy target state, current branch.
|
||||
```
|
||||
|
||||
This costs 2 lines but saves entire sessions from losing critical context.
|
||||
|
||||
### Resulting structure (~75 lines)
|
||||
|
||||
```
|
||||
Lines 1-2: Project description + stack
|
||||
Lines 3-6: Beta freeze notice
|
||||
Lines 7-12: Quick reference (dev, build, deploy commands)
|
||||
Lines 13-18: Architecture diagram (compact)
|
||||
Lines 19-20: Data paths
|
||||
Lines 21-26: Critical Rules (5 rules)
|
||||
Lines 27-33: App Integration Checklist
|
||||
Lines 34-36: Git conventions
|
||||
Lines 37-39: Compact instructions
|
||||
```
|
||||
|
||||
Infrastructure table moves to auto-memory where it's still loaded at session start.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Hook Deduplication — Eliminate Double Execution
|
||||
|
||||
### Problem
|
||||
|
||||
Every `Bash` call runs **both** global `pretooluse-bash.sh` AND project `block-risky-bash.sh`. Every `Edit|Write` call runs **both** global `pretooluse-files.sh` AND project `protect-files.sh`. They overlap on ~80% of patterns (rm -rf, git reset --hard, .git/ edits, .env files, etc.).
|
||||
|
||||
**Cost**: 2 extra Python processes per tool call, checking the same patterns twice.
|
||||
|
||||
### Solution: Project hooks become project-specific only
|
||||
|
||||
**File**: `.claude/hooks/block-risky-bash.sh`
|
||||
**Action**: Strip all patterns already covered by global hook. Keep ONLY:
|
||||
- Cargo build on macOS (Archy-specific: "build on dev server via SSH")
|
||||
- Path traversal with rm (more aggressive check than global)
|
||||
|
||||
~15 lines instead of ~80.
|
||||
|
||||
**File**: `.claude/hooks/protect-files.sh`
|
||||
**Action**: Strip all patterns already covered by global hook. Keep ONLY:
|
||||
- `scripts/deploy-config.sh` (Archy-specific credential file)
|
||||
- Path-outside-project check (project-specific boundary)
|
||||
|
||||
~20 lines instead of ~75.
|
||||
|
||||
**Global hooks stay unchanged** — they're the universal baseline.
|
||||
|
||||
### Result
|
||||
- Before: 4 Python processes per Bash call (2 global + 2 project parsing same JSON)
|
||||
- After: 2 Python processes per Bash call (1 global comprehensive + 1 tiny project-specific)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Memory System — Consolidate and Clean
|
||||
|
||||
### Problem
|
||||
|
||||
Two separate memory systems with overlapping content:
|
||||
1. **Auto-memory** (`~/.claude/projects/-Users-dorian-Projects-archy/memory/`) — 8 files, auto-loaded
|
||||
2. **Project memory** (`.claude/memory/`) — 26 files, NOT auto-loaded
|
||||
|
||||
Claude sees auto-memory every session. Project memory only loads if Claude manually reads it.
|
||||
|
||||
### Solution: Curate auto-memory, keep project memory as archive
|
||||
|
||||
**Auto-memory MEMORY.md** — restructure to ~25 lines with the most critical feedback:
|
||||
|
||||
```markdown
|
||||
# Archipelago Project Memory
|
||||
|
||||
## Critical Feedback (prevent recurring mistakes)
|
||||
- [Direct Port Rule](feedback_apps_always_direct_port.md) — Apps MUST use direct port, NEVER proxy paths
|
||||
- [External URLs](feedback_external_urls_iframe.md) — Open https:// directly, never /ext/
|
||||
- [Deploy All Nodes](feedback_indeedhub_deploy_all_servers.md) — Deploy to ALL nodes
|
||||
- [No Tor Publishing](feedback_no_tor_relay_publishing.md) — Never publish .onion to relays
|
||||
- [UFW Forward](feedback_podman_ufw_forward.md) — DEFAULT_FORWARD_POLICY=ACCEPT
|
||||
- [Deploy Patterns](feedback_deploy_patterns.md) — Rootless port 80, cred sync, image export
|
||||
- [Asset Workflow](feedback_asset_workflow.md) — Never generate images, user is designer
|
||||
- [ASCII Logo](feedback_logo_ascii.md) — Block-letter logo locked, never change
|
||||
- [Claude Cleanup](feedback_claude_cleanup.md) — Instruction optimization principles
|
||||
|
||||
## Infrastructure
|
||||
- [CI/CD & Registry](reference_cicd_registry.md) — git.tx1138.com, act_runner, insecure registry
|
||||
- [Multi-Node Deploy](reference_multi_node_deploy.md) — 5 nodes, SSH keys, deploy methods
|
||||
- [Infrastructure Quick Ref](reference_infrastructure.md) — IPs, passwords, SSH keys (moved from CLAUDE.md)
|
||||
|
||||
## Project State
|
||||
- [ISO Testing](project_iso_testing_plan.md) — Hardware matrix, boot compatibility
|
||||
- [ISO Custom Base](project_iso_size_reduction.md) — Debootstrap ISO, remaining issues
|
||||
|
||||
## Archive
|
||||
Detailed project memory in .claude/memory/MEMORY.md (26 files, not auto-loaded).
|
||||
```
|
||||
|
||||
**New auto-memory files to create** (migrated from project memory):
|
||||
- `feedback_apps_always_direct_port.md` — Broken THREE TIMES, highest-value feedback
|
||||
- `feedback_deploy_patterns.md` — Hard-won container patterns
|
||||
- `feedback_asset_workflow.md` — Prevents wasted effort generating images
|
||||
- `feedback_logo_ascii.md` — Prevents changing locked-in branding
|
||||
- `reference_infrastructure.md` — Infrastructure table from CLAUDE.md (IPs, SSH, passwords)
|
||||
|
||||
**Project memory (.claude/memory/)**:
|
||||
- Add comment at top of MEMORY.md: `<!-- Archive: not auto-loaded. Active memory at ~/.claude/projects/.../memory/ -->`
|
||||
- Fix `deploy-automation.md` (Phase 0 — remove API key)
|
||||
- Update `unbundled-iso.md` (still says "NOT YET BUILT")
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Permissions — Auto-Approve Safe Commands
|
||||
|
||||
**File**: `.claude/settings.local.json`
|
||||
|
||||
**Current**: Only `ssh:*` and `gh api:*` allowed.
|
||||
|
||||
**Updated** — add read-only and build/test commands:
|
||||
|
||||
```json
|
||||
{
|
||||
"permissions": {
|
||||
"allow": [
|
||||
"Bash(ssh:*)",
|
||||
"Bash(gh api:*)",
|
||||
"Bash(cd neode-ui*)",
|
||||
"Bash(npm run *)",
|
||||
"Bash(npm test*)",
|
||||
"Bash(npm start*)",
|
||||
"Bash(npx vue-tsc*)",
|
||||
"Bash(npx vitest*)",
|
||||
"Bash(git log*)",
|
||||
"Bash(git diff*)",
|
||||
"Bash(git status*)",
|
||||
"Bash(git branch*)",
|
||||
"Bash(git show*)",
|
||||
"Bash(git stash*)",
|
||||
"Bash(cargo check*)",
|
||||
"Bash(cargo clippy*)",
|
||||
"Bash(cargo test*)",
|
||||
"Bash(journalctl*)",
|
||||
"Bash(systemctl status*)",
|
||||
"Bash(ls *)",
|
||||
"Bash(wc *)",
|
||||
"Bash(file *)",
|
||||
"Bash(xxd *)",
|
||||
"Bash(df *)",
|
||||
"Bash(du *)"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**NOT auto-approved** (still require confirmation):
|
||||
- `git push/commit` — Affects remote/creates state
|
||||
- `cargo build` — Blocked by hook on macOS anyway
|
||||
- `npm install` — Modifies dependencies
|
||||
- `./scripts/deploy-*` — Deploys to servers
|
||||
- `rm`, `mv`, `cp` — Potentially destructive
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Merge iso-branding into build-iso
|
||||
|
||||
**Problem**: `iso-branding` is a pure design reference, only relevant during ISO builds. Its description consumes skill budget.
|
||||
|
||||
**Action**:
|
||||
1. Move `.claude/skills/iso-branding/SKILL.md` content → `.claude/skills/build-iso/references/branding.md`
|
||||
2. Update `build-iso/SKILL.md` to reference the branding file
|
||||
3. Delete `.claude/skills/iso-branding/` directory
|
||||
|
||||
**Skill count**: 11 → 10
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Add Backend Rule File
|
||||
|
||||
**Problem**: No path-scoped rule for Rust backend. 3 backend rules sit in CLAUDE.md (loaded every session even for frontend-only work).
|
||||
|
||||
**New file**: `.claude/rules/backend.md`
|
||||
|
||||
```markdown
|
||||
---
|
||||
globs:
|
||||
- "core/**/*.rs"
|
||||
- "core/**/Cargo.toml"
|
||||
---
|
||||
|
||||
# Backend Rules (Archipelago — Rust)
|
||||
|
||||
- Backend binds `127.0.0.1` only — nginx handles external access
|
||||
- Validate all input before path construction — reject `..`, `/`, null bytes
|
||||
- Timeouts on all external operations (10s default, 30s heavy)
|
||||
- Use `anyhow::Result` for error propagation, not `.unwrap()` in handlers
|
||||
- Log with `tracing`, never `println!` or `eprintln!` in production paths
|
||||
- Container commands through `PodmanClient` (core/container/), never raw Command::new("podman")
|
||||
```
|
||||
|
||||
Delete the Backend section from CLAUDE.md (moved here).
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Tighten prompt-injection-detect.sh
|
||||
|
||||
**Problem**: `context_manipulation` pattern matches `IMPORTANT:`, `CRITICAL:`, `<system>` — normal in code/docs. Creates false positive warnings.
|
||||
|
||||
**Action**: Tighten the `context_manipulation` regex to require injection-specific signatures:
|
||||
|
||||
```bash
|
||||
# OLD (too broad):
|
||||
"IMPORTANT:|CRITICAL:|SYSTEM:|ADMIN:|<system>|</system>|<instructions>"
|
||||
|
||||
# NEW (specific):
|
||||
"(?:^|\s)(?:SYSTEM|ADMIN):\s*(?:you are|ignore|forget|override|new instructions)|<(?:system|instructions)>.*(?:ignore|override|forget)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Add 2 Focused Agents
|
||||
|
||||
**Current**: 1 agent (iframe-specialist, 678 lines)
|
||||
|
||||
**Add**:
|
||||
|
||||
### `.claude/agents/deploy-specialist.md`
|
||||
```yaml
|
||||
---
|
||||
name: deploy-specialist
|
||||
description: Deploys to all 5 Archipelago nodes. Knows SSH access, build capabilities, post-deploy verification.
|
||||
tools: Bash, Read, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
```
|
||||
Body: Node inventory, deploy workflow, IndeedHub multi-node rules, post-deploy checklist.
|
||||
|
||||
### `.claude/agents/code-reviewer.md`
|
||||
```yaml
|
||||
---
|
||||
name: code-reviewer
|
||||
description: Reviews code against Archipelago standards — frontend patterns, Rust safety, container security, crypto rules.
|
||||
tools: Read, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
```
|
||||
Body: Frontend rules, backend rules, container rules, security checklist.
|
||||
|
||||
**Agent count**: 1 → 3
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: Skill Frontmatter Audit
|
||||
|
||||
**Problem**: Action skills that have side effects should have `disable-model-invocation: true` to prevent Claude from auto-invoking them.
|
||||
|
||||
| Skill | Has `disable-model-invocation: true`? | Needs it? |
|
||||
|-------|--------------------------------------|-----------|
|
||||
| add-app | Yes | Yes (side effects) |
|
||||
| add-web-app | Verify | Yes |
|
||||
| build-iso | Verify | Yes (builds ISO) |
|
||||
| iso-debug | Verify | Yes (runs diagnostics) |
|
||||
| podman | Verify | Yes (modifies containers) |
|
||||
| polish | Verify | Yes (modifies code) |
|
||||
| sweep | Verify | Yes (runs checks, may fix) |
|
||||
| mesh | No | No (reference knowledge) |
|
||||
| design-pixel-retro | No | No (reference knowledge) |
|
||||
| gamepad-nav | No | No (reference knowledge) |
|
||||
|
||||
Action: Verify and add `disable-model-invocation: true` to all 7 action skills.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Phase | Impact | Files Changed | Benefit |
|
||||
|-------|--------|---------------|---------|
|
||||
| 0. Remove API key | CRITICAL | 1 | Security |
|
||||
| 1. Trim CLAUDE.md | HIGH | 1 | ~500 tokens/session saved |
|
||||
| 2. Dedup hooks | HIGH | 2 | ~200ms faster per tool call |
|
||||
| 3. Memory consolidate | HIGH | ~8 | Cleaner context, no stale data |
|
||||
| 4. Permissions | MEDIUM | 1 | ~3s saved per safe command |
|
||||
| 5. Merge iso-branding | LOW | 3 | 1 less skill description |
|
||||
| 6. Backend rule | MEDIUM | 2 | Path-scoped, not always-loaded |
|
||||
| 7. Injection hook | LOW | 1 | Fewer false positives |
|
||||
| 8. New agents | MEDIUM | 2 new | Better delegation |
|
||||
| 9. Skill frontmatter | LOW | ~5 | Prevents unintended auto-invoke |
|
||||
|
||||
**Net changes**: CLAUDE.md 101→~75 lines, skills 11→10, agents 1→3, rules 5→6, hooks 60% smaller
|
||||
|
||||
---
|
||||
|
||||
## What This Plan Does NOT Change (and why each was evaluated)
|
||||
|
||||
- **Global CLAUDE.md** (36 lines) — Already optimized, passes the "would removing cause mistakes?" test
|
||||
- **Global hooks** (8 scripts) — Universal baseline, well-tuned, no project overlap
|
||||
- **Global rules** (api, crypto, bitcoin) — Correct glob scoping, concise content
|
||||
- **Global settings.json** — Plugins, effort level, hook config all justified
|
||||
- **iframe-specialist agent** — Deep reference, correctly scoped, rarely loaded
|
||||
- **Skills mesh/gamepad-nav/design-pixel-retro** — Tiny description cost (~120 chars each), valuable on-demand
|
||||
- **MCP servers** — Not needed (self-hosted infra, no external API integrations)
|
||||
- **Agent teams** — Experimental, single-developer project doesn't benefit
|
||||
- **Project .claude/memory/ (26 files)** — Kept as archive with annotation
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After implementation:
|
||||
- [ ] `grep -r "sk-ant" .claude/` returns zero results
|
||||
- [ ] New session auto-loads MEMORY.md with all critical feedback
|
||||
- [ ] `git status` auto-approves without permission prompt
|
||||
- [ ] `/sweep` skill loads and executes correctly
|
||||
- [ ] Project hooks run fast (no duplicate pattern checks)
|
||||
- [ ] `cd neode-ui && npx vue-tsc -b --noEmit` passes
|
||||
- [ ] Spawning deploy-specialist agent works
|
||||
- [ ] CLAUDE.md is ≤80 lines
|
||||
- [ ] `/context` shows reasonable token budget
|
||||
174
.claude/plans/rosy-floating-lightning.md
Normal file
174
.claude/plans/rosy-floating-lightning.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# Plan: Optimize Claude Code Instructions for Maximum Coding Performance
|
||||
|
||||
## Context
|
||||
|
||||
### The Problem
|
||||
Research across Anthropic's official docs, engineering blog, GitHub issues, and academic papers converges on one finding: **instruction overload degrades Claude's coding performance**. The more tokens consumed by rules/instructions, the less attention and context remain for actual code generation.
|
||||
|
||||
Key evidence:
|
||||
- Anthropic official docs: *"Bloated CLAUDE.md files cause Claude to ignore your actual instructions!"*
|
||||
- Boris Cherny (Claude Code creator) uses ~100 lines / ~2,500 tokens for his CLAUDE.md
|
||||
- Research (Jaroslawicz et al., 2025): instruction compliance decreases linearly as count increases; frontier models plateau at ~150-200 instructions; Claude Code's system prompt already uses ~50
|
||||
- "Lost in the Middle" (Stanford, 2024): LLMs exhibit U-shaped attention — middle content gets least attention
|
||||
- Anthropic engineering blog: *"Find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome"*
|
||||
- Aggressive language (BANNED, NEVER, CRITICAL, Non-Negotiable) overtriggers on Claude 4.5/4.6 — Anthropic explicitly recommends dialing it back
|
||||
- Multiple GitHub issues (15443, 28158, 16073, 34197) document systematic instruction ignoring with large CLAUDE.md files
|
||||
|
||||
### Current State (Archy Project)
|
||||
|
||||
**Always-loaded instruction payload:**
|
||||
| Source | Lines | Chars | Est. Tokens |
|
||||
|--------|-------|-------|-------------|
|
||||
| Global CLAUDE.md | 97 | 5,624 | ~1,400 |
|
||||
| Project CLAUDE.md | 130 | 5,270 | ~1,300 |
|
||||
| 5 rules files | 119 | 5,123 | ~1,280 |
|
||||
| MEMORY.md index | 16 | 1,099 | ~275 |
|
||||
| 33 skill descriptions (system) | ~300 | ~13,200 | ~3,300 |
|
||||
| **Total always-loaded** | **~662** | **~30,316** | **~7,555** |
|
||||
|
||||
Plus ~10 memory files (~290 lines, ~19K chars) loaded on relevance, and 33 skills totaling ~122K chars loaded on demand.
|
||||
|
||||
### Key Problems Identified
|
||||
|
||||
1. **Global CLAUDE.md is ~60% things Claude already knows** — "Comment WHY not WHAT," "Functions under 50 lines," "Zero compiler warnings" are standard practices Claude follows without being told
|
||||
2. **Anti-Hallucination section (28 lines) restates built-in behavior** — package verification is in Claude's training
|
||||
3. **Redundancy across files** — security rules appear in global CLAUDE.md + crypto.md + api.md + project CLAUDE.md (4x)
|
||||
4. **Aggressive language throughout** — "BANNED," "Non-Negotiable," "MANDATORY," "NEVER" — Anthropic says this causes overtriggering on current models
|
||||
5. **Project CLAUDE.md duplicates rules files** — Frontend section repeats frontend.md, Security section repeats crypto.md + api.md
|
||||
6. **Philosophy section is ~30 lines that don't affect code generation** — Claude won't suggest altcoins or proprietary deps regardless
|
||||
|
||||
### What We Preserve (per user request)
|
||||
- All deploy commands, build commands, SSH access, CI/CD info
|
||||
- All infrastructure keys/addresses/IPs
|
||||
- Security and quality architecture rules that prevent real mistakes
|
||||
- All memory files and feedback (operational learnings)
|
||||
- All skills (they already use progressive disclosure correctly)
|
||||
|
||||
---
|
||||
|
||||
## The Plan
|
||||
|
||||
### Principle: Every line must prevent a specific mistake Claude would otherwise make
|
||||
|
||||
If Claude would do the right thing without the instruction -> delete it.
|
||||
If Claude does the wrong thing even with the instruction -> make it a hook.
|
||||
If it only matters for specific files -> scope it with globs in rules/.
|
||||
|
||||
### Step 1: Rewrite Global CLAUDE.md (~97 -> ~35 lines)
|
||||
|
||||
**Remove (Claude already knows these):**
|
||||
- "Comment WHY not WHAT" — standard practice
|
||||
- "Functions under 50 lines, single responsibility" — standard practice
|
||||
- "Zero compiler warnings, zero linter errors" — standard practice
|
||||
- "Remove dead code entirely" — standard practice
|
||||
- "Deploy and verify changes" — project-specific, belongs in project CLAUDE.md
|
||||
- Entire "Core Principles" enumeration (5 items) — the one-line philosophy header covers it
|
||||
- "Encryption first" details — covered by crypto.md rules file
|
||||
- Most of "Anti-Hallucination" section (28 lines) — Claude already verifies packages; keep only "cross-reference existing deps" which is non-obvious
|
||||
- "Code Sourcing: What to avoid" items 3-4 — too specific, rarely triggered
|
||||
|
||||
**Keep (prevents real mistakes):**
|
||||
- Bitcoin-only stance (1 line) — prevents suggesting altcoin libs
|
||||
- Open source preference (1 line)
|
||||
- Code sourcing core rules (no vibe-code repos, no vendoring without approval)
|
||||
- Dependency selection order (rustls > openssl, etc.) — non-obvious preferences
|
||||
- Security standards not in rules files (never commit secrets, pin versions)
|
||||
- Project ecosystem listing — useful cross-project context
|
||||
- Atomic commit format
|
||||
|
||||
**Rewrite style:** Calm, direct. No MANDATORY, no bold on every line.
|
||||
|
||||
### Step 2: Rewrite Project CLAUDE.md (~130 -> ~75 lines)
|
||||
|
||||
**Remove (duplicated in scoped rules files):**
|
||||
- Frontend section (lines 70-77) — exact duplicate of .claude/rules/frontend.md
|
||||
- Security section (lines 87-94) — duplicates crypto.md + api.md + containers.md
|
||||
- "See .claude/rules/ for detailed..." pointer — Claude loads them automatically
|
||||
|
||||
**Remove (Claude already knows):**
|
||||
- "No unwrap()/expect() — use ? with .context()" — standard Rust practice
|
||||
- "tracing for logging, never println!" — standard practice
|
||||
- "tokio runtime" — obvious from the codebase
|
||||
|
||||
**Keep and tighten (all non-obvious, prevents real mistakes):**
|
||||
- Overview + Stack (essential context)
|
||||
- Beta freeze status (active project constraint)
|
||||
- Quick Reference commands (frequently used, non-guessable)
|
||||
- Infrastructure table (IPs, keys, remotes — user explicitly wants these)
|
||||
- Architecture diagram (essential mental model)
|
||||
- Critical Rules (5 items — all non-obvious)
|
||||
- Backend: only non-obvious rules (bind 127.0.0.1, path validation, timeouts)
|
||||
- ISO Build commands (operational knowledge)
|
||||
- App Integration Checklist (prevents real mistakes)
|
||||
- Git conventions (one line)
|
||||
|
||||
### Step 3: Tone Adjustment (all files)
|
||||
|
||||
Per Anthropic's explicit guidance for Claude 4.5/4.6:
|
||||
|
||||
| Before | After |
|
||||
|--------|-------|
|
||||
| `.gradient-button` is BANNED | Use `.glass-button` for all buttons, not `.gradient-button` |
|
||||
| Non-Negotiable | _(remove header, rules speak for themselves)_ |
|
||||
| MANDATORY checks | _(remove, rules are clear)_ |
|
||||
| NEVER use floating point | Sats are always integers (`u64`/`BigInt`), not floats |
|
||||
| NEVER build Rust on macOS | Do not build Rust on macOS — deploy script handles cross-compilation |
|
||||
|
||||
This is not cosmetic — Anthropic docs state aggressive language causes overtriggering.
|
||||
|
||||
### Step 4: Tighten Rules Files
|
||||
|
||||
- **frontend.md** — Tone adjustment only (already 8 good rules, glob-scoped)
|
||||
- **containers.md** — Reorder critical rules to top, tone adjustment. Keep UID table and systemd requirements (genuine lookup references)
|
||||
- **api.md, bitcoin.md, crypto.md** — Tone adjustment only (already concise and glob-scoped)
|
||||
|
||||
### Step 5: Clean Up Memory Index
|
||||
|
||||
- Fix duplicate Session 2026-03-28 entry in MEMORY.md
|
||||
- Add missing entries for untracked files (feedback_asset_workflow.md, project_iso_size_reduction.md, etc.)
|
||||
- All memory file content preserved as-is
|
||||
|
||||
### Step 6: No Changes To
|
||||
|
||||
- **Skills** — Load on demand (correct architecture). 33 skill descriptions at ~100 tokens each is the design intent.
|
||||
- **Hooks** — Already well-structured.
|
||||
- **Settings** — Good as-is.
|
||||
- **Rules file glob scoping** — Already correct.
|
||||
|
||||
---
|
||||
|
||||
## Expected Impact
|
||||
|
||||
| Metric | Before | After | Reduction |
|
||||
|--------|--------|-------|-----------|
|
||||
| Global CLAUDE.md | 97 lines / 5,624 chars | ~35 lines / ~2,100 chars | 64% |
|
||||
| Project CLAUDE.md | 130 lines / 5,270 chars | ~75 lines / ~3,200 chars | 42% |
|
||||
| Rules files | 119 lines / 5,123 chars | ~115 lines / ~5,000 chars | 3% |
|
||||
| **Total always-loaded** | **346 lines / 16,017 chars** | **~225 lines / ~10,300 chars** | **35%** |
|
||||
|
||||
Key outcomes:
|
||||
- Every remaining line prevents a specific, real mistake
|
||||
- No redundancy between files
|
||||
- Calm, direct tone matched to current model behavior
|
||||
- Critical rules at top/bottom of files (exploits primacy/recency attention bias)
|
||||
- ~1,400 tokens freed for actual code context per session
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. `/Users/dorian/.claude/CLAUDE.md` — Rewrite (97 -> ~35 lines)
|
||||
2. `/Users/dorian/Projects/archy/CLAUDE.md` — Rewrite (130 -> ~75 lines)
|
||||
3. `/Users/dorian/Projects/archy/.claude/rules/frontend.md` — Tone adjustment (BANNED -> positive)
|
||||
4. `/Users/dorian/Projects/archy/.claude/rules/containers.md` — Reorder + tone
|
||||
5. `/Users/dorian/.claude/rules/bitcoin.md` — Tone adjustment
|
||||
6. `/Users/dorian/.claude/rules/crypto.md` — Tone adjustment
|
||||
7. `/Users/dorian/.claude/projects/-Users-dorian-Projects-archy/memory/MEMORY.md` — Fix index
|
||||
|
||||
## Verification
|
||||
|
||||
1. Start a new Claude Code session on archy
|
||||
2. Check infrastructure IPs, SSH keys, deploy commands are all accessible
|
||||
3. Ask Claude to write a Vue component — should follow glass-button, script setup, style.css
|
||||
4. Ask Claude to write Rust backend code — should use ?, bind 127.0.0.1
|
||||
5. Ask Claude about deploying — should know deploy-to-target.sh, .228, .198
|
||||
6. Ask Claude to add a container — should follow rootless Podman, UID mapping
|
||||
7. Observe: faster responses, less hedging, more focused output
|
||||
241
.claude/plans/smooth-roaming-wadler.md
Normal file
241
.claude/plans/smooth-roaming-wadler.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Container Orchestration Dev Testing Infrastructure
|
||||
|
||||
## Context
|
||||
|
||||
Container orchestration has been unreliable for months. Every fix requires a full deploy to .228 (5+ minutes), manual SSH debugging, and prayer. No way to test orchestration logic locally or catch regressions before deploy. We need three layers of testing so orchestration is bulletproof before it ever touches a server.
|
||||
|
||||
## Three Layers
|
||||
|
||||
### Layer C: Mock Podman in Rust Unit Tests (runs on macOS, instant)
|
||||
|
||||
Tests the orchestration LOGIC without any containers. Runs in `cargo test`, takes seconds.
|
||||
|
||||
**What it tests:** Retry backoff timing, restart tracker persistence, tier ordering, stop grace periods, failsafe install flow, health monitor state machine, crash recovery.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
Create `core/archipelago/src/container/mock_podman.rs` — a fake podman command executor:
|
||||
|
||||
```rust
|
||||
pub struct MockPodman {
|
||||
containers: Arc<Mutex<HashMap<String, MockContainer>>>,
|
||||
fail_pull: Arc<AtomicBool>, // simulate registry down
|
||||
fail_start: Arc<AtomicBool>, // simulate container crash on start
|
||||
pull_delay_ms: Arc<AtomicU64>, // simulate slow pull
|
||||
}
|
||||
|
||||
struct MockContainer {
|
||||
name: String,
|
||||
image: String,
|
||||
state: ContainerState, // Created/Running/Exited/Stopped
|
||||
exit_code: i32,
|
||||
created_at: DateTime<Utc>,
|
||||
}
|
||||
```
|
||||
|
||||
Key trait to add in `runtime.rs`:
|
||||
```rust
|
||||
#[async_trait]
|
||||
pub trait CommandExecutor: Send + Sync {
|
||||
async fn execute(&self, program: &str, args: &[&str]) -> Result<CommandOutput>;
|
||||
}
|
||||
```
|
||||
|
||||
Production uses `RealExecutor` (calls `tokio::process::Command`). Tests use `MockPodman`.
|
||||
|
||||
**Test file:** `core/archipelago/tests/orchestration_tests.rs`
|
||||
|
||||
Tests to write:
|
||||
1. `test_stop_grace_periods` — bitcoin gets 600s, lnd 330s, unknown gets 30s
|
||||
2. `test_pull_retry_backoff` — fail twice, succeed third, verify 5s/15s delays
|
||||
3. `test_pull_all_attempts_fail` — fail 3x, verify error returned
|
||||
4. `test_restart_tracker_persistence` — save to disk, reload, verify counters survive
|
||||
5. `test_restart_tracker_stability_reset` — after 1h, counters clear
|
||||
6. `test_failsafe_install_rollback` — container exits immediately, verify cleanup
|
||||
7. `test_failsafe_install_image_missing` — pull succeeds but image not found, verify error
|
||||
8. `test_health_monitor_tier_ordering` — databases restart before apps
|
||||
9. `test_health_monitor_skips_user_stopped` — user-stopped containers not restarted
|
||||
10. `test_health_monitor_max_attempts` — stops after 3 failures
|
||||
11. `test_crash_recovery_loads_snapshot` — PID file + snapshot → containers restarted
|
||||
12. `test_crash_recovery_skips_user_stopped` — user-stopped not recovered
|
||||
|
||||
**Files to modify:**
|
||||
- `core/archipelago/src/container/mod.rs` — add `pub mod mock_podman;`
|
||||
- `core/archipelago/src/container/mock_podman.rs` — NEW mock implementation
|
||||
- `core/archipelago/tests/orchestration_tests.rs` — NEW test file
|
||||
- `core/archipelago/src/health_monitor.rs` — extract logic into testable functions (pure functions that take data, not functions that call podman)
|
||||
- `core/archipelago/src/api/rpc/package/runtime.rs` — make `stop_timeout_secs` public for testing
|
||||
|
||||
**Key refactors to make code testable:**
|
||||
- Extract `stop_timeout_secs()` → `pub fn` so tests can call it directly
|
||||
- Extract health monitor `check_and_restart()` into a function that takes container list + tracker + user_stopped, returns actions to take (restart X, notify Y, skip Z) — pure logic, no IO
|
||||
- Extract `RestartTracker` + `RestartHistory` into own file for independent testing
|
||||
- Make `pull_image_with_progress` retry logic independent of progress streaming
|
||||
|
||||
---
|
||||
|
||||
### Layer A: SSH Dev Loop in dev-start.sh (real containers on .228)
|
||||
|
||||
New option 9 in `dev-start.sh`: "Container orchestration dev (live on .228)"
|
||||
|
||||
**What it does:**
|
||||
1. Rsync code to .228 (2 seconds)
|
||||
2. Build backend on .228 (incremental: 5-15 seconds)
|
||||
3. Restart archipelago service
|
||||
4. Run orchestration smoke tests via RPC
|
||||
5. Show container status + health monitor logs
|
||||
6. Loop: edit locally → press Enter → rsync+rebuild+test
|
||||
|
||||
**What it tests:** Real podman, real containers, real networking. The actual install/start/stop/restart/health cycle.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
Add option 9 to `scripts/dev-start.sh`:
|
||||
```bash
|
||||
9)
|
||||
echo "Container Orchestration Dev (live testing on .228)"
|
||||
exec "$SCRIPT_DIR/dev-container-test.sh"
|
||||
;;
|
||||
```
|
||||
|
||||
Create `scripts/dev-container-test.sh` (~150 lines):
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Fast edit-build-test loop for container orchestration on .228
|
||||
#
|
||||
# Usage: ./scripts/dev-container-test.sh [--once]
|
||||
#
|
||||
# Syncs code, builds, restarts, runs orchestration smoke tests.
|
||||
# Press Enter to re-run, Ctrl+C to stop.
|
||||
|
||||
SSH="ssh -o StrictHostKeyChecking=no -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228"
|
||||
|
||||
sync_and_build() {
|
||||
rsync (same excludes as deploy script)
|
||||
ssh: cargo build --release -p archipelago (incremental)
|
||||
ssh: sudo systemctl restart archipelago
|
||||
ssh: wait for health endpoint (15s timeout)
|
||||
}
|
||||
|
||||
run_smoke_tests() {
|
||||
# Test 1: Container list works
|
||||
curl -s /rpc/v1 -d '{"method":"container.list"}'
|
||||
|
||||
# Test 2: Install filebrowser (small, fast, no deps)
|
||||
curl -s /rpc/v1 -d '{"method":"package.install","params":{"id":"filebrowser","dockerImage":"..."}}'
|
||||
# Wait for running state
|
||||
|
||||
# Test 3: Stop with grace period
|
||||
curl -s /rpc/v1 -d '{"method":"package.stop","params":{"id":"filebrowser"}}'
|
||||
# Verify stopped
|
||||
|
||||
# Test 4: Start
|
||||
curl -s /rpc/v1 -d '{"method":"package.start","params":{"id":"filebrowser"}}'
|
||||
# Verify running
|
||||
|
||||
# Test 5: Health check
|
||||
curl -s /rpc/v1 -d '{"method":"container.health"}'
|
||||
|
||||
# Test 6: Check restart-tracker.json exists
|
||||
ssh: cat /var/lib/archipelago/restart-tracker.json
|
||||
|
||||
# Test 7: Check health monitor logs for errors
|
||||
ssh: journalctl -u archipelago --since "2 min ago" | grep -i "error\|panic\|fail"
|
||||
|
||||
# Test 8: Uninstall
|
||||
curl -s /rpc/v1 -d '{"method":"package.uninstall","params":{"id":"filebrowser"}}'
|
||||
}
|
||||
|
||||
# Main loop
|
||||
while true; do
|
||||
sync_and_build
|
||||
run_smoke_tests
|
||||
echo "Press Enter to re-run, Ctrl+C to stop"
|
||||
read
|
||||
done
|
||||
```
|
||||
|
||||
**Files:**
|
||||
- `scripts/dev-start.sh` — add option 9
|
||||
- `scripts/dev-container-test.sh` — NEW
|
||||
|
||||
---
|
||||
|
||||
### Layer B: CI Integration Tests (runs on .228 via Gitea Actions)
|
||||
|
||||
Extend the existing CI to run container orchestration tests on every push to dev-iso.
|
||||
|
||||
**What it tests:** Full lifecycle on real hardware after every code change. Catches regressions automatically.
|
||||
|
||||
**Implementation:**
|
||||
|
||||
Create `.gitea/workflows/container-tests.yml`:
|
||||
```yaml
|
||||
name: Container Orchestration Tests
|
||||
on:
|
||||
push:
|
||||
branches: [dev-iso, main]
|
||||
paths:
|
||||
- 'core/**'
|
||||
- 'scripts/container-*.sh'
|
||||
- 'scripts/reconcile-*.sh'
|
||||
|
||||
jobs:
|
||||
unit-tests:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Rust unit tests (orchestration)
|
||||
run: cargo test -p archipelago -- orchestration --no-fail-fast
|
||||
|
||||
integration-tests:
|
||||
runs-on: ubuntu-latest
|
||||
needs: unit-tests
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Deploy to test node
|
||||
run: |
|
||||
# Rsync + build on .228
|
||||
# Run orchestration smoke tests
|
||||
bash scripts/run-container-tests.sh
|
||||
```
|
||||
|
||||
Create `scripts/run-container-tests.sh` (~200 lines):
|
||||
Reuses the smoke test logic from dev-container-test.sh but structured for CI:
|
||||
- JSON output for CI parsing
|
||||
- Exit codes for pass/fail
|
||||
- Timeout handling (5 min max)
|
||||
- Cleanup after test (remove test containers)
|
||||
- Tests: install, start, stop, restart, uninstall, health check, restart tracker, reconciliation
|
||||
|
||||
**Files:**
|
||||
- `.gitea/workflows/container-tests.yml` — NEW
|
||||
- `scripts/run-container-tests.sh` — NEW
|
||||
|
||||
---
|
||||
|
||||
## Execution Order
|
||||
|
||||
1. **Layer C first** (mock tests) — Get the logic tested, runs locally, fast feedback
|
||||
2. **Layer A second** (dev loop) — Test against real containers with fast iteration
|
||||
3. **Layer B last** (CI) — Automate regression catching
|
||||
|
||||
## Files Summary
|
||||
|
||||
| File | Action | Layer |
|
||||
|------|--------|-------|
|
||||
| `core/archipelago/src/container/mock_podman.rs` | NEW | C |
|
||||
| `core/archipelago/src/container/mod.rs` | MODIFY | C |
|
||||
| `core/archipelago/tests/orchestration_tests.rs` | NEW | C |
|
||||
| `core/archipelago/src/health_monitor.rs` | REFACTOR (extract pure logic) | C |
|
||||
| `core/archipelago/src/api/rpc/package/runtime.rs` | MODIFY (pub fn) | C |
|
||||
| `scripts/dev-start.sh` | MODIFY (add option 9) | A |
|
||||
| `scripts/dev-container-test.sh` | NEW | A |
|
||||
| `.gitea/workflows/container-tests.yml` | NEW | B |
|
||||
| `scripts/run-container-tests.sh` | NEW | B |
|
||||
|
||||
## Verification
|
||||
|
||||
- Layer C: `cargo test -p archipelago -- orchestration` — all pass on macOS
|
||||
- Layer A: `./scripts/dev-start.sh` → option 9 → green smoke tests on .228
|
||||
- Layer B: Push to dev-iso → CI green on container-tests workflow
|
||||
89
.claude/plans/toasty-inventing-cascade.md
Normal file
89
.claude/plans/toasty-inventing-cascade.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# Plan: ISO Polish — Fix Everything for Beta Release
|
||||
|
||||
## Context
|
||||
Fresh ISO install on .198 revealed 11 issues ranging from critical (app installs, Tor broken) to UX (GRUB scaling, boot splash, kiosk reliability). Goal: next ISO build produces a flawless out-of-box experience.
|
||||
|
||||
## Issues & Fixes (priority order)
|
||||
|
||||
### 1. CRITICAL: Tor services.json not written (escaping bug)
|
||||
**Symptom:** `setup-tor.sh: line 12: $ARCHY_TOR_DIR/services.json: No such file or directory`
|
||||
**Root cause:** In `build-auto-installer-iso.sh`, the setup-tor heredoc escapes `$ARCHY_TOR_DIR` as `\$ARCHY_TOR_DIR`, producing a literal `$` in the output script. The variable never expands at runtime.
|
||||
**Fix:** In the heredoc that generates setup-tor.sh (~line 1200), use unescaped `$ARCHY_TOR_DIR` so it expands at runtime. The heredoc itself uses `<<TORSCRIPT` (unquoted) so we need to check the quoting carefully.
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (setup-tor heredoc section)
|
||||
|
||||
### 2. CRITICAL: App installs failing ("Operation failed")
|
||||
**Symptom:** Screenshot shows "Failed: Error: Operation failed. Check server logs" + "Downloading..." stuck
|
||||
**Root cause:** This is the OLD build (pre-CSRF fix). The new build has the fix. However, `sanitize_error_message()` in `middleware.rs` masks ALL real errors. Need to verify the new build works.
|
||||
**Fix:** Already fixed (auth.ts, rpc-client.ts, mod.rs). Verify on next ISO.
|
||||
**Also:** Consider allowing "Failed to pull" errors through the sanitizer so users see meaningful install errors.
|
||||
**File:** `core/archipelago/src/api/rpc/middleware.rs`
|
||||
|
||||
### 3. HIGH: Kiosk white screen / never loads on first boot
|
||||
**Symptom:** First boot: black screen → white screen → kiosk never loads. Second boot works fine.
|
||||
**Root cause:** The kiosk `ExecStartPre` health check polls 15x with 2s delay (30s max), but on first boot the backend may not be ready within 30s (first-boot-containers, Tor setup, etc. all running). Chromium opens `http://localhost/kiosk` before nginx/backend is fully up → white page. No retry logic in the launcher.
|
||||
**Fix:** Increase health check to 30 attempts (60s). Add a loading page that Chromium shows while waiting (a simple HTML file served by nginx even when backend is down). Add `--disable-gpu` flag to Chromium (fixes some white screen issues on low-end GPUs).
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (kiosk launcher + ExecStartPre)
|
||||
|
||||
### 4. HIGH: GRUB theme text not scaling / cut off on 4:3 monitors
|
||||
**Symptom:** Screenshot shows "Install (var" cut off, menu items barely readable on 1280x1024 Dell
|
||||
**Root cause:** GRUB theme uses percentage-based layout but no font size control. GRUB defaults to a small bitmap font. The `item_height = 40` is fixed pixels, too small at some resolutions. No explicit font loaded in theme.txt.
|
||||
**Fix:** In `grub.cfg`, load a larger font (24px DejaVu or similar). Adjust theme.txt: increase `item_height`, move menu position up, ensure text fits. Add `loadfont` to grub.cfg.
|
||||
**Files:** `image-recipe/branding/grub-theme/theme.txt`, `image-recipe/build-auto-installer-iso.sh` (grub.cfg generation)
|
||||
|
||||
### 5. HIGH: LUKS partition not showing in disk stats
|
||||
**Symptom:** Server view doesn't show LUKS encryption status or the encrypted partition
|
||||
**Root cause:** Backend `system.disk-status` uses `df /` or `df /var/lib/archipelago` but doesn't report LUKS status. No `cryptsetup status` call. Frontend only shows used/total/free/percent.
|
||||
**Fix:** Add LUKS detection to the disk status RPC: check if `/dev/mapper/archipelago*` exists, read `cryptsetup status`. Return `encrypted: true/false` and `encryption_cipher` fields. Frontend: show a lock icon + "LUKS2 Encrypted" badge in disk stats.
|
||||
**Files:** `core/archipelago/src/api/rpc/system/handlers.rs`, `neode-ui/src/views/Server.vue`
|
||||
|
||||
### 6. MEDIUM: No Plymouth boot splash showing
|
||||
**Symptom:** No animation between GRUB and login — just black screen with blinking cursor
|
||||
**Root cause:** Plymouth theme files exist in `image-recipe/branding/plymouth-theme/` but the ISO build doesn't copy the logo.png or install the theme properly. Also kernel cmdline needs `splash quiet` and `plymouth-set-default-theme` must be run.
|
||||
**Fix:** Verify the ISO build copies plymouth theme + logo.png to rootfs, runs `plymouth-set-default-theme archipelago`, and kernel cmdline includes `splash quiet`.
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (plymouth setup section)
|
||||
|
||||
### 7. MEDIUM: No custom MOTD
|
||||
**Symptom:** Default Debian MOTD on VT1 login
|
||||
**Fix:** Add custom MOTD to ISO build that shows Archipelago ASCII logo, version, IP address, and useful commands (kiosk toggle, SSH info).
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (add MOTD generation)
|
||||
|
||||
### 8. MEDIUM: Onboarding intro needs double press
|
||||
**Symptom:** Pressing the intro circle/button once resets, need to press twice
|
||||
**Root cause:** `SplashScreen.vue` has a 48-segment ring animation triggered on hover. The splash → intro transition may have a race condition with animation completion. `OnboardingIntro.vue` auto-focuses CTA after 2100ms delay — if user clicks before that, focus may steal the event.
|
||||
**Fix:** Investigate SplashScreen.vue transition timing. Add click debounce or ensure single-click always proceeds.
|
||||
**Files:** `neode-ui/src/components/SplashScreen.vue`, `neode-ui/src/views/OnboardingIntro.vue`
|
||||
|
||||
### 9. MEDIUM: No TUI animations in actual installer
|
||||
**Symptom:** Installer is functional but plain — no bouncing Bitcoin, no glitch effects from demo
|
||||
**Root cause:** `scripts/install-tui-demo.sh` has elaborate animations but the actual installer in the ISO build script is minimal (basic spinner + typewriter only).
|
||||
**Fix:** Port key animations from install-tui-demo.sh into the actual installer: logo decrypt reveal, progress bars with percentage, phase transitions. Keep it lightweight but visually distinctive.
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (auto-install.sh section)
|
||||
|
||||
### 10. LOW: Container tests CI failing
|
||||
**Symptom:** `cargo: command not found` in container-tests workflow
|
||||
**Fix:** Add `source $HOME/.cargo/env` to test steps. Already staged locally.
|
||||
**File:** `.gitea/workflows/container-tests.yml`
|
||||
|
||||
### 11. LOW: Kiosk enable/disable command lacks visual feedback
|
||||
**Symptom:** User runs command, MOTD changes but no immediate visual confirmation
|
||||
**Root cause:** The `archipelago-kiosk` script DOES print feedback messages. The issue may be that VT auto-switches and the user doesn't see the output.
|
||||
**Fix:** Add a brief sleep before VT switch so user sees the confirmation message. Consider adding a `--quiet` flag for scripted use.
|
||||
**File:** `image-recipe/build-auto-installer-iso.sh` (kiosk toggle script)
|
||||
|
||||
## Execution Order
|
||||
1. Tor fix (#1) — 5 min, critical
|
||||
2. Kiosk reliability (#3) — 15 min, high impact
|
||||
3. GRUB text scaling (#4) — 15 min, visible
|
||||
4. LUKS disk stats (#5) — 20 min, backend + frontend
|
||||
5. App install error messages (#2) — 10 min, verify + improve
|
||||
6. Plymouth boot splash (#6) — 15 min
|
||||
7. Custom MOTD (#7) — 10 min
|
||||
8. Intro double-press (#8) — 10 min
|
||||
9. TUI animations (#9) — 30 min (port from demo)
|
||||
10. CI fix (#10) — 2 min
|
||||
11. Kiosk feedback (#11) — 5 min
|
||||
|
||||
## Verification
|
||||
- Build new ISO on .228 via CI (push to main)
|
||||
- Flash to USB, install on .198
|
||||
- Check: GRUB readable → Plymouth splash → TUI installer animations → MOTD shows → Kiosk loads first time → Tor onion addresses visible → App installs work → Disk shows LUKS → Intro single-click works
|
||||
205
.claude/plans/twinkly-baking-ladybug.md
Normal file
205
.claude/plans/twinkly-baking-ladybug.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# BIP-39 Master Seed — Unified Key Derivation for Archipelago
|
||||
|
||||
## Context
|
||||
|
||||
Archipelago's current identity system is broken:
|
||||
- Node key generated randomly at boot, before the user exists
|
||||
- Each identity creates independent random Ed25519 + secp256k1 keys
|
||||
- ADR-008 says "both keys derived from same master seed" but code doesn't do this
|
||||
- Backup only covers the node key, not identity keys
|
||||
- No seed phrase — backup is an opaque encrypted blob with a user passphrase
|
||||
- Restore path disabled ("Coming Soon")
|
||||
- No connection between node identity and Bitcoin/LND wallet keys
|
||||
|
||||
**Goal:** One 24-word BIP-39 seed phrase derives ALL keys. User writes down 24 words, can recover everything on a fresh install.
|
||||
|
||||
---
|
||||
|
||||
## Derivation Scheme
|
||||
|
||||
```
|
||||
BIP-39 Mnemonic (24 words, 256-bit entropy)
|
||||
-> PBKDF2-HMAC-SHA512 (2048 rounds, empty passphrase)
|
||||
-> Master Seed (64 bytes)
|
||||
|
|
||||
+-- HKDF-SHA256(seed, info="archipelago/node/ed25519/v1") -> Node Ed25519 key -> did:key
|
||||
+-- HKDF-SHA256(seed, info="archipelago/nostr-node/secp256k1/v1") -> Node Nostr key
|
||||
+-- HKDF-SHA256(seed, info="archipelago/identity/{i}/ed25519/v1") -> Identity i Ed25519 -> did:key
|
||||
+-- BIP-32 m/44'/1237'/0'/0/{i} -> Identity i Nostr key (NIP-06)
|
||||
+-- BIP-32 m/84'/0'/0' -> Bitcoin Core wallet (native segwit)
|
||||
+-- HKDF-SHA256(seed, info="archipelago/lnd/entropy/v1") -> 16 bytes -> LND aezeed entropy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Seed Module (foundation)
|
||||
|
||||
### New crates in `core/archipelago/Cargo.toml`
|
||||
```toml
|
||||
bip39 = "=2.1.0"
|
||||
bitcoin = { version = "=0.32.5", features = ["rand-std"] }
|
||||
```
|
||||
|
||||
### New file: `core/archipelago/src/seed.rs`
|
||||
|
||||
**`MasterSeed` struct** — wraps `Zeroizing<[u8; 64]>`, implements `ZeroizeOnDrop`
|
||||
|
||||
Functions:
|
||||
- `MasterSeed::generate() -> (Mnemonic, MasterSeed)` — 256-bit entropy, 24 words
|
||||
- `MasterSeed::from_mnemonic(mnemonic) -> MasterSeed` — for restore
|
||||
- `MasterSeed::from_mnemonic_words(words: &str) -> Result<(Mnemonic, MasterSeed)>` — parse + validate
|
||||
- `derive_node_ed25519(&MasterSeed) -> SigningKey` — HKDF with info `archipelago/node/ed25519/v1`
|
||||
- `derive_identity_ed25519(&MasterSeed, index: u32) -> SigningKey` — HKDF with info `archipelago/identity/{index}/ed25519/v1`
|
||||
- `derive_nostr_identity_key(&MasterSeed, index: u32) -> nostr_sdk::Keys` — BIP-32 `m/44'/1237'/0'/0/{index}`
|
||||
- `derive_node_nostr_key(&MasterSeed) -> nostr_sdk::Keys` — HKDF with info `archipelago/nostr-node/secp256k1/v1`
|
||||
- `derive_bitcoin_xprv(&MasterSeed) -> Xpriv` — BIP-32 `m/84'/0'/0'`
|
||||
- `derive_lnd_entropy(&MasterSeed) -> [u8; 16]` — HKDF with info `archipelago/lnd/entropy/v1`
|
||||
- `save_seed_encrypted(data_dir, mnemonic, passphrase)` — Argon2+ChaCha20 to `master_seed.enc`
|
||||
- `load_seed_encrypted(data_dir, passphrase) -> Mnemonic`
|
||||
- `seed_exists(data_dir) -> bool`
|
||||
- `save_identity_index(data_dir, next_index: u32)` / `load_identity_index(data_dir) -> u32`
|
||||
|
||||
Security: Never log seed/mnemonic. All seed types implement `ZeroizeOnDrop`. File permissions 0o600.
|
||||
|
||||
Existing building blocks to reuse:
|
||||
- `mesh/crypto.rs:hkdf_sha256()` / `hkdf_sha256_32()` — already implemented
|
||||
- `backup/identity.rs` encryption pattern — Argon2+ChaCha20 (reuse for `save_seed_encrypted`)
|
||||
- `ed25519-dalek`, `sha2`, `hmac`, `hkdf`, `zeroize` — all in Cargo.toml already
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Onboarding UI
|
||||
|
||||
### New Vue views:
|
||||
|
||||
**`OnboardingSeedGenerate.vue`** — calls `seed.generate`, displays 24 words in grid, "I wrote these down" checkbox
|
||||
|
||||
**`OnboardingSeedVerify.vue`** — picks 4 random word positions, user types them back, calls `seed.verify`, shows DID + npub on success
|
||||
|
||||
**`OnboardingSeedRestore.vue`** — 24 input fields with BIP-39 wordlist autocomplete, calls `seed.restore`
|
||||
|
||||
### New onboarding flow:
|
||||
```
|
||||
Intro -> Options (Fresh / Restore) -> [branch]
|
||||
|
||||
FRESH: SeedGenerate -> SeedVerify -> Identity (name/purpose) -> Done
|
||||
RESTORE: SeedRestore -> Done
|
||||
```
|
||||
|
||||
### Router changes (`neode-ui/src/router/index.ts`):
|
||||
- Add routes: `onboarding/seed`, `onboarding/seed-verify`, `onboarding/seed-restore`
|
||||
- Remove: `onboarding/did`, `onboarding/backup`, `onboarding/verify`
|
||||
- Enable Restore path in `OnboardingOptions.vue`
|
||||
|
||||
### RPC client (`neode-ui/src/api/rpc-client.ts`):
|
||||
- `generateSeed()`, `verifySeed()`, `restoreSeed()`, `saveSeedEncrypted()`, `seedStatus()`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Backend Integration
|
||||
|
||||
### `identity.rs` — add `NodeIdentity::from_seed(identity_dir, &MasterSeed)`
|
||||
- Derives Ed25519 node key via `seed::derive_node_ed25519()`
|
||||
- Writes to `node_key` / `node_key.pub` (same format as today)
|
||||
- Existing `load_or_create()` unchanged (loads from disk, works for both seed-derived and legacy keys)
|
||||
|
||||
### `identity_manager.rs` — seed-aware `create()`
|
||||
- When seed available: derive Ed25519 from `derive_identity_ed25519(seed, index)`, Nostr from `derive_nostr_identity_key(seed, index)`
|
||||
- Increment and persist `identity_index`
|
||||
- Add `derivation_index: Option<u32>` to `IdentityFile` (serde default, backward-compatible)
|
||||
- When no seed (legacy): fall back to current random generation
|
||||
|
||||
### `server.rs` — startup flow:
|
||||
```
|
||||
seed exists + node_key exists -> Normal seed-backed operation
|
||||
no seed + node_key exists -> Legacy node, show migration prompt
|
||||
no seed + no node_key -> Fresh install, await onboarding
|
||||
seed exists + no node_key -> Re-derive from seed (recovery)
|
||||
```
|
||||
- Add `seed_backed: bool` to `ServerInfo`
|
||||
|
||||
### New RPC endpoints in `api/rpc/seed.rs`:
|
||||
- `seed.generate` — generates mnemonic, derives & writes node keys, returns words (onboarding only, unauth)
|
||||
- `seed.verify` — validates user re-entered correct words (onboarding only)
|
||||
- `seed.restore` — accepts 24 words, derives all keys, writes to disk (onboarding only, unauth)
|
||||
- `seed.save-encrypted` — encrypts mnemonic to `master_seed.enc` (optional convenience)
|
||||
- `seed.status` — returns `{ has_seed, is_legacy, identity_count, next_index }`
|
||||
- `seed.derive-lnd-entropy` — password-protected, returns 16 bytes for LND wallet init
|
||||
- `seed.derive-bitcoin-xprv` — password-protected, returns xprv for Bitcoin Core import
|
||||
|
||||
In-memory mnemonic between `seed.generate` and `seed.verify`: held in `Mutex<Option<Zeroizing<String>>>` with 10-minute auto-clear timeout.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Bitcoin/LND Integration
|
||||
|
||||
### LND wallet from seed:
|
||||
- `lnd.init-wallet-from-seed` handler — derives 16-byte entropy, calls LND REST `POST /v1/initwallet` with `seed_entropy`
|
||||
- Triggered during LND first-install flow
|
||||
|
||||
### Bitcoin Core wallet from seed:
|
||||
- `bitcoin.init-wallet-from-seed` handler — derives BIP-84 xprv, calls `createwallet` + `importdescriptors` via Bitcoin Core RPC
|
||||
- Triggered during Bitcoin Core first-install flow
|
||||
|
||||
Both endpoints require password re-verification.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Migration & Polish
|
||||
|
||||
### Legacy node migration:
|
||||
- Detect legacy nodes (node_key exists, no master_seed.enc)
|
||||
- Settings page shows prompt: "Set up seed phrase to protect future identities"
|
||||
- Existing keys preserved — only NEW identities use seed derivation
|
||||
- Optional full migration (`seed.migrate-legacy`) can be added later
|
||||
|
||||
### Cleanup:
|
||||
- Remove old `OnboardingDid.vue`, `OnboardingBackup.vue`, `OnboardingVerify.vue`
|
||||
- Update Settings backup section to show seed status
|
||||
- Update ADR-008 to reflect implementation matches description
|
||||
|
||||
---
|
||||
|
||||
## File Layout After Implementation
|
||||
|
||||
```
|
||||
{data_dir}/identity/
|
||||
node_key # 32 bytes Ed25519 secret (derived from seed or legacy)
|
||||
node_key.pub # 32 bytes Ed25519 public
|
||||
master_seed.enc # NEW: encrypted mnemonic (optional convenience backup)
|
||||
identity_index # NEW: next derivation index (plain text integer)
|
||||
{data_dir}/identities/
|
||||
{uuid}.json # Same format + optional derivation_index field
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Files to Modify
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `core/archipelago/Cargo.toml` | Add `bip39`, `bitcoin` crates |
|
||||
| `core/archipelago/src/seed.rs` | **NEW** — all seed logic |
|
||||
| `core/archipelago/src/identity.rs` | Add `from_seed()` constructor |
|
||||
| `core/archipelago/src/identity_manager.rs` | Seed-aware `create()`, add `derivation_index` |
|
||||
| `core/archipelago/src/server.rs` | Startup state detection (seed/legacy/fresh) |
|
||||
| `core/archipelago/src/api/rpc/seed.rs` | **NEW** — seed RPC handlers |
|
||||
| `core/archipelago/src/api/rpc/dispatcher.rs` | Register seed.* endpoints |
|
||||
| `neode-ui/src/views/OnboardingSeedGenerate.vue` | **NEW** — show 24 words |
|
||||
| `neode-ui/src/views/OnboardingSeedVerify.vue` | **NEW** — verify written words |
|
||||
| `neode-ui/src/views/OnboardingSeedRestore.vue` | **NEW** — enter 24 words to restore |
|
||||
| `neode-ui/src/views/OnboardingOptions.vue` | Enable Restore path |
|
||||
| `neode-ui/src/router/index.ts` | Update onboarding routes |
|
||||
| `neode-ui/src/api/rpc-client.ts` | Add seed RPC methods |
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
1. **Unit tests**: Deterministic derivation (same mnemonic -> same keys), invalid mnemonic rejection, index increment, zeroization
|
||||
2. **Integration**: Fresh install flow end-to-end, restore flow (generate on node A, enter words on node B, verify same DID/npub)
|
||||
3. **Security**: Grep seed.rs for tracing macros that interpolate seed vars, verify file permissions
|
||||
4. **LND**: Derive entropy, init wallet, verify deterministic aezeed
|
||||
5. **Bitcoin Core**: Derive xprv, import descriptors, verify addresses match
|
||||
6. **Legacy**: Existing node without seed starts normally, can still create identities
|
||||
7. **Type check**: `cd neode-ui && npx vue-tsc -b --noEmit`
|
||||
Reference in New Issue
Block a user