Research2026-04-20

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

arXiv:2604.15356v1 Announce Type: cross Abstract: Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transformer key-value caches. We observe that this limit applies to a strictly weaker problem than the one that...

Read Original Article on Arxiv CS.AI

arxivpapers