Research2026-05-12
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference
Source: Arxiv CS.AI
arXiv:2605.08151v1 Announce Type: cross Abstract: LLM serving platforms are increasingly deployed as multi-model cloud systems, where user demand is often long-tailed: a few popular large models receive most requests, while many smaller tail models remain underutilized. We propose \textbf{SPECTRE}...
arxivpapers