Research2026-05-08
Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving
Source: Arxiv CS.AI
arXiv:2601.21351v2 Announce Type: replace-cross Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While...
arxivpapers