Research2026-05-12
FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast
Source: Arxiv CS.AI
arXiv:2605.08314v1 Announce Type: cross Abstract: SVD-based Low-rank compression reduces transformer parameters and nominal FLOPs, but these savings often translate poorly into real LLM serving speedups. We show that this gap is largely a runtime problem: factorized checkpoints fragment execution...
arxivpapers