Research2026-05-08

Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving

arXiv:2601.21351v2 Announce Type: replace-cross Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While...

Read Original Article on Arxiv CS.AI

arxivpapers