feat(orchestrator): drift-sync existing Quadlet units on each reconcile

When a Quadlet unit file already exists for an orchestrator-managed
backend, sync its on-disk bytes against what the current renderer
produces. write_if_changed makes this idempotent — when bytes match,
no IO; when they differ (post-deploy of a renderer change), the file
is rewritten and systemctl --user daemon-reload runs once.

We deliberately do NOT restart the .service when the file changes:
running containers keep their current config until the operator
restarts them. That's the right tradeoff — file updates are cheap and
non-destructive; service restarts are the SIGKILL cascade we're
trying to eliminate.

Why this matters: pre-this-commit, every renderer change required a
fresh package.install RPC per app to take effect. Observed live on
.228 2026-05-02 — the TimeoutStartSec=600 fix shipped in code but
existing units stayed on the old format because nothing triggered a
re-render. Combined with state.json being empty (so the reconciler's
auto-install path didn't fire either), the fix was invisible until
manual unit deletion.

Companions (UI_APP_IDS) are skipped — companion.rs renders those units
with a different shape; syncing here would clobber them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago
2026-05-02 11:43:18 -04:00
parent 44f275eda4
commit 0889367dbf

View File

@@ -382,6 +382,14 @@ impl ProdContainerOrchestrator {
if let Some(action) = self.migrate_to_quadlet_if_needed(lm, &name).await? {
return Ok(action);
}
// Sync drift: keep an existing Quadlet unit's bytes in step
// with what the renderer produces today, even when nothing
// else triggers an install. Without this, every renderer change
// (new directive, fixed bug) requires a fresh package.install
// RPC per app to take effect — observed live on .228 2026-05-02
// where the TimeoutStartSec=600 fix shipped in code but no
// existing units picked it up.
self.sync_quadlet_unit(lm, &name).await?;
}
match self.runtime.get_container_status(&name).await {
@@ -596,6 +604,54 @@ impl ProdContainerOrchestrator {
Ok(Some(ReconcileAction::Installed))
}
/// Drift-sync an existing Quadlet unit file's bytes against what the
/// current renderer produces. No-op when the flag is off, when the
/// app is a companion (companion.rs owns those units), or when no
/// unit file exists yet (install_via_quadlet handles first-write).
///
/// We DON'T restart the .service when content changes — running
/// containers keep their current config until an operator-initiated
/// restart picks up the new file. That's the right tradeoff: file
/// updates are cheap and non-destructive; service restarts are
/// destructive (the SIGKILL cascade we're trying to eliminate).
/// systemctl --user daemon-reload runs only when content actually
/// changed, so steady-state reconcile ticks pay just one fs read.
async fn sync_quadlet_unit(&self, lm: &LoadedManifest, name: &str) -> Result<()> {
// Companions: same reasoning as migrate_to_quadlet_if_needed —
// companion.rs renders these units with a different shape, syncing
// here would clobber them.
let app_id = lm.manifest.app.id.as_str();
if UI_APP_IDS.contains(&app_id) {
return Ok(());
}
let unit_dir = quadlet::unit_dir()
.await
.context("locate user quadlet unit dir for drift sync")?;
let unit_path = unit_dir.join(format!("{name}.container"));
// Only sync when an existing file is on disk — otherwise this is
// a fresh install and install_via_quadlet will write it anyway.
if !tokio::fs::try_exists(&unit_path).await.unwrap_or(false) {
return Ok(());
}
let mut resolved = lm.manifest.clone();
self.resolve_dynamic_env(&mut resolved)?;
let unit = quadlet::QuadletUnit::from_manifest(&resolved, name);
let changed = quadlet::write_if_changed(&unit, &unit_dir)
.await
.with_context(|| format!("drift-sync quadlet unit for {name}"))?;
if changed {
quadlet::daemon_reload_user()
.await
.context("systemctl --user daemon-reload after drift-syncing quadlet unit")?;
tracing::info!(
app_id = %lm.manifest.app.id,
container = %name,
"Quadlet unit drift-synced — file rewritten, .service NOT restarted (operator restart picks up new config)"
);
}
Ok(())
}
/// Phase 3.2 install path. Renders the manifest as a Quadlet unit,
/// writes it atomically into ~/.config/containers/systemd/, asks
/// systemd to reload, and starts the generated service. Errors at