Research2026-05-12
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
Source: Arxiv CS.AI
arXiv:2605.10886v1 Announce Type: cross Abstract: Recent GPU generations deliver significantly higher FLOPs using lower-precision arithmetic, such as FP8. While successfully applied to large language models (LLMs), its adoption in large recommendation models (LRMs) has been limited. This is because...
arxivpapers