Research2026-05-12

SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM Inference

arXiv:2605.08151v1 Announce Type: cross Abstract: LLM serving platforms are increasingly deployed as multi-model cloud systems, where user demand is often long-tailed: a few popular large models receive most requests, while many smaller tail models remain underutilized. We propose \textbf{SPECTRE}...

Read Original Article on Arxiv CS.AI

arxivpapers