BeClaude
Research2026-05-11

Regulating Branch Parallelism in LLM Serving

Source: Arxiv CS.AI

arXiv:2605.06914v1 Announce Type: cross Abstract: Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission...

arxivpapers