BeClaude
Back to News
Partnership2026-04-17

Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

Source: Arxiv CS.AI

arXiv:2507.18454v2 Announce Type: replace-cross Abstract: CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands under non-disaggregated deployment...

arxivpapers