Research2026-05-08
Saliency-Aware Regularized Quantization Calibration for Large Language Models
Source: Arxiv CS.AI
arXiv:2605.05693v1 Announce Type: new Abstract: Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a...
arxivpapers