Research2026-05-14
FlashSampling: Fast and Memory-Efficient Exact Sampling
Source: Arxiv CS.AI
arXiv:2603.15854v2 Announce Type: replace-cross Abstract: Sampling from a categorical distribution is mathematically simple, but in large-vocabulary decoding, it often triggers extra memory traffic and extra kernels after the LM head. We present FlashSampling, an exact sampling primitive that fuses...
arxivpapers