BeClaude
Research2026-05-14

FlashSampling: Fast and Memory-Efficient Exact Sampling

Source: Arxiv CS.AI

arXiv:2603.15854v2 Announce Type: replace-cross Abstract: Sampling from a categorical distribution is mathematically simple, but in large-vocabulary decoding, it often triggers extra memory traffic and extra kernels after the LM head. We present FlashSampling, an exact sampling primitive that fuses...

arxivpapers