BeClaude
Research2026-04-23

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Source: Arxiv CS.AI

arXiv:2604.15153v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the...

arxivpapers