Research2026-05-12
Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs
Source: Arxiv CS.AI
arXiv:2605.08894v1 Announce Type: cross Abstract: Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving the numerical accuracy of forward computation to...
arxivpapers