HydraIssues

hydrapipelinerunnerapple: runner stuck after BrokerServer long-poll drops -- needs watchdog restart
open unclassified Project: hydrapipelinerunnerapple Reporter: 20 May 2026 16:34

Description

On 2026-05-20 the GitHub Actions runner on cederikmini (PID 31241, running since 2026-05-18) stopped picking up new jobs. v0.2.122 sat in queued for 10 min despite the runner showing online in the GitHub API.

Recurred same day for v0.2.123 (PID 51343) — roughly 2.5 hours after the previous restart. Same pattern: Runner.Listener alive, broker session broken, no jobs dispatched.

Root cause: the runner long-poll HTTP connection to broker.actions.githubusercontent.com drops after extended uptime (network hiccup or GitHub-side idle timeout). The Runner.Listener process stays alive but the broker session is broken and the retry loop eventually gives up. GitHub runner registration does not expire so the API keeps reporting online.

Manual fix (both times): `hydracluster exec node-38f14c8b kill <pid>` — run.sh respawns the process, which reconnects and picks up the queued job within ~30s.

Desired fix: add a watchdog on cederikmini that monitors the Runner.Listener log (or broker session state) and auto-restarts the process when the broker session has been down for more than ~5 min. This prevents builds from silently queuing indefinitely and avoids manual intervention.