Research2026-04-28

MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches

arXiv:2604.22881v1 Announce Type: cross Abstract: Generative recommendation (GR) offers superior modeling capabilities but suffers from prohibitive inference costs due to the repeated encoding of long user histories. While cross-request Key-Value (KV) cache reuse presents a significant optimization...

Read Original Article on Arxiv CS.AI

arxivpapers