Research2026-05-11
Regulating Branch Parallelism in LLM Serving
Source: Arxiv CS.AI
arXiv:2605.06914v1 Announce Type: cross Abstract: Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission...
arxivpapers