Back to News
Partnership2026-04-17
Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving
Source: Arxiv CS.AI
arXiv:2507.18454v2 Announce Type: replace-cross Abstract: CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands under non-disaggregated deployment...
arxivpapers