Research2026-04-23

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

arXiv:2604.15153v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the...

Read Original Article on Arxiv CS.AI

arxivpapers