HydraIssues

hydrabody should periodically re-fetch body config (currently only startup/reprovision)
open improvement Project: hydrabody Reporter: 22 Apr 2026 17:42

Description

## Context

On 2026-04-22 we shipped hydracluster v2.0.25 (one-line change: `resp.EnableDebug = false`). The expectation: bodies pick up the new value on their next heartbeat cycle and stop passing `--debug` to kioskoverlay. Reality: nothing changed until we manually cleared each body's `config_cache.yaml` and restarted the HydraBody scheduled task.

Root cause in `pkg/provider/provider.go`:

```
if p.Config == nil {
// fetchAndCacheConfig()
}
```

`loadCachedConfig()` is called on startup (line 62), populating `p.Config` from the on-disk YAML. The tick loop's fetch block only triggers if `p.Config == nil`. So once the cache is populated, hydrabody never refreshes — except when hydracluster signals `Reprovision: true` via the body/status response (triggers the full `provisionForce` path, which is heavy and re-downloads Sunshine).

## Impact

- Changes to BodyConfig fields (enable_debug, virtual_display_enabled, kiosk_mode_enabled, district/venue reclassifications, provider version, experience library token, etc.) do not propagate to bodies until someone manually forces a reprovision or clears the cache.
- Hid a two-day debug loop on 2026-04-21/22 — we thought shipping a hydracluster change would propagate, but nothing did.
- Boom-pickle's cache showed `district: bxl1-test` even though hydracluster had reclassified it to `district: bxl1 / venue: rupelmonde` — the stale cache lasted days.

## Proposal

Add a lightweight periodic config refresh in the main tick loop:

```go
// Periodically refresh body config from server to pick up admin changes.
// Cheap: single HTTP GET, no side-effects unless fields changed.
if time.Since(p.lastConfigFetch) > 5*time.Minute {
if err := p.fetchAndCacheConfig(); err != nil {
log.Printf("config refresh failed: %v", err)
} else {
p.lastConfigFetch = time.Now()
}
}
```

Diff-and-apply logic (restart overlay if `enable_debug` changed, restart provider only if `provider_version` changed, etc.) can be added incrementally. For v1 just keep `p.Config` fresh so tick-side setters (SetEnableDebug, SetBackdropPath) see current server state.

## Why this matters

Structural. Without periodic refresh, every server-side config change requires a fleet-wide manual cache-clear + restart dance (done three times today already). Every new hydracluster release that changes a default suffers this.

## References

- 2026-04-22 manual cache clears on wobbly, boom-pickle-38, cosmic-pretzel-98.
- hydracluster commit `ad7e134` (v2.0.25) — the trigger that exposed this.