release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red
Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.
What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
to 3 times with 500ms and 1500ms backoffs between attempts. Only
transient transport errors (timeout, connect refused, send/recv IO)
trigger retry. A well-formed bitcoind error response is surfaced
immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
of reqwest's own timeout, so DNS starvation or TLS wedging can't
steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
so a dead bitcoind is fast-failed inside the first attempt instead
of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
down timeouts to ~100ms per attempt. Production defaults live in
RetryConfig::production().
Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
same 10s-timeout pattern. Not migrated to the new wrapper in this
release; scheduled for v1.7.43 alongside the render_bitcoin_conf
work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
different failure profiles - audit first, migrate only the ones
that actually exhibit the bug.
Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
- happy_path_first_attempt: no retry when first attempt succeeds
- retries_on_timeout_then_succeeds: first attempt times out, second
succeeds, returns OK (uses a short-timeout RetryConfig so the test
runs in <1s instead of 15s)
- retries_exhausted_on_persistent_connect_refused: all attempts fail
against a closed port, error bubbles up, elapsed time confirms
backoffs actually ran
- does_not_retry_on_rpc_level_error: bitcoind-returned error body is
surfaced immediately, no retry
- does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
body) is NOT retried - guards against the tempting "retry all
non-2xx" mistake that would mask real bitcoind misconfig
- retry_budget_invariants: asserts total wall-time ceiling stays
under 60s so a bumped constant can't silently hang a UI call
forever
Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
This commit is contained in:
@@ -180,6 +180,16 @@ init()
|
||||
</button>
|
||||
</div>
|
||||
<div class="overflow-y-auto flex-1 min-h-0 space-y-6 pr-1">
|
||||
<!-- v1.7.42-alpha -->
|
||||
<div>
|
||||
<div class="flex items-center gap-2 mb-3">
|
||||
<span class="text-xs font-mono px-2 py-0.5 rounded bg-orange-500/20 text-orange-300">v1.7.42-alpha</span>
|
||||
<span class="text-xs text-white/40">Apr 22, 2026</span>
|
||||
</div>
|
||||
<div class="space-y-3 text-sm text-white/80 pl-3 border-l border-white/10">
|
||||
<p>Bitcoin dashboards no longer flicker errors during initial chain sync. When bitcoind is busy validating a fresh block it can take up to 10 seconds to answer RPC — the old code gave up after exactly 10 seconds, so any call that landed during that window surfaced as a failure even though the node was perfectly healthy. The RPC client now retries transient timeouts transparently (3 attempts, ~500ms + 1500ms backoff between them) and only surfaces errors that bitcoind itself reported. Connection refused is still fast-failed so genuinely-dead bitcoinds are reported in under a second.</p>
|
||||
</div>
|
||||
</div>
|
||||
<!-- v1.7.41-alpha -->
|
||||
<div>
|
||||
<div class="flex items-center gap-2 mb-3">
|
||||
|
||||
Reference in New Issue
Block a user