#256 Video decode queue overflow storm + /cancel not terminating session promptly

Description

## Stream context

Session: hydraheadipad v0.2.92, fluffy-dumpling-87 at 10.110.15.197, experience: rupelmonde-castle-viewer, 1920x1080 at 25000 kbps, H.264.

---

## Anomaly 1 — Video decode unit queue overflow storm (11:48:22–11:48:32)

### What the log shows

About 1 minute into the stream, the decoder queue began overflowing repeatedly. Each overflow immediately triggered an IDR frame request. New IDR frames arrived into an already-full queue, causing another overflow, which triggered another IDR request — a self-reinforcing loop lasting about 10 seconds across at least 11 overflow events.

### What the code shows

**Queue size limit:** In VideoDepacketizer.c, the decode unit queue is initialized with a hard-coded bound of 15:

LbqInitializeLinkedBlockingQueue(&decodeUnitQueue, 15);

When LbqOfferQueueItem() finds currentSize == sizeBound it returns LBQ_BOUND_EXCEEDED.

**Overflow recovery path (reassembleFrame in VideoDepacketizer.c, lines 506–525):**

When LBQ_BOUND_EXCEEDED is returned, the code:
1. Sets waitingForIdrFrame = true
2. Drops the frame that just failed to enqueue
3. Flushes all frames already in the queue (LbqFlushQueueItems)
4. Calls LiRequestIdrFrame()

Flushing the queue should free space for the incoming IDR frame — but because the flush and the incoming IDR arrive asynchronously, the queue can already be partially refilled with P-frames by the time the IDR arrives. If the decoder is too slow to consume frames (e.g. because iOS VideoToolbox is falling behind at 25 Mbps), the queue fills again immediately after the IDR arrives, producing another overflow, another flush, another IDR request — the loop observed in the log.

**Bitrate hardcoded at 25000 kbps regardless of routing:**

ContentView.swift line 89–91 always returns 25000:

private func bitrateFor(host: String) -> Int {
25000
}

HeadConfig.swift lines 51–55 shows StreamConfig.bitrateKbps would return 150000 for a LAN host (non-10.10.x.x). Body host 10.110.15.197 is LAN, not WireGuard. ContentView.swift overrides this logic, capping all streams at 25000 kbps. At 1920x1080 H.264 60fps, 25000 kbps is a heavy sustained load for iOS VideoToolbox. If the decoder falls behind transiently, frames accumulate in the 15-slot queue faster than they are consumed, and the overflow loop triggers.

### Possible fixes to consider

- After an overflow+flush, back off IDR requests (e.g. wait one RTT or 200ms before requesting again) rather than requesting immediately on every subsequent overflow. This would break the tight loop.
- Investigate whether 25000 kbps at 1080p60 H.264 is sustainable on the target iPad model. Reducing to 15000–20000 kbps may eliminate decoder stalls under normal load.
- The bitrateFor() override in ContentView.swift shadows the per-routing logic in HeadConfig.swift. Decide which one is authoritative and remove the other.

---

## Anomaly 2 — 95-second gap between stop() (11:48:47) and server termination (11:50:22)

### What the log shows

HydraStreamSession.stop() was called at 11:48:47 and logged "sending /cancel". The server did not terminate until 11:50:22 — 95 seconds later. In that window the ENet control channel logged: "Failed to send ENet control packet", "Loss Stats: Transaction failed: 60", "ENet peer is already disconnected".

### What the code shows

**sendCancelToSunshine in HydraStreamSession.m (lines 205–220):**

The /cancel request is fire-and-forget. It is sent with a 3-second timeout. The completion block only logs the response at LOG_D level — it takes no action on failure (no retry, no fallback, no error propagation to the caller):

req.timeoutInterval = 3;
[[session dataTaskWithRequest:req completionHandler:^(NSData *data, NSURLResponse *resp, NSError *error) {
Log(LOG_D, @"Sunshine /cancel: %@ %@", resp, error);
[session invalidateAndCancel];
}] resume];

**The URL used:**

https://<host>:47984/cancel?uniqueid=0123456789ABCDEF

The uniqueid is the hardcoded string "0123456789ABCDEF". If Sunshine tracks sessions by the uniqueid negotiated during pairing, a /cancel with a mismatched hardcoded uniqueid may be silently ignored, leaving the session alive until Sunshine fires its own inactivity timeout.

**What the 95-second gap indicates:**

The ENet errors show the client-side control channel was torn down immediately when stop() was called — the iPad stopped communicating. Sunshine terminating at 11:50:22 with reason 0x80030023 is consistent with Sunshine detecting client inactivity and timing out the session on its own schedule, not in direct response to the /cancel HTTP request.

### Possible fixes to consider

- Verify that the uniqueid sent to /cancel matches the uniqueid negotiated during pairing. The pairing flow in HydraPairSession likely assigns a uniqueid; that value should be stored and reused in sendCancelToSunshine rather than the hardcoded "0123456789ABCDEF".
- Log the HTTP response status code from /cancel so future logs confirm whether Sunshine acknowledged the cancel. Currently only the raw NSURLResponse and NSError objects are logged at LOG_D, which do not appear in production session logs.
- If /cancel fails (non-2xx or NSError), log a warning at a level visible in production logs so operator can determine whether the 95-second tail is a regular occurrence.

HydraIssues

Description