Research2026-05-14
GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs
Source: Arxiv CS.AI
arXiv:2508.14302v2 Announce Type: replace-cross Abstract: Inference-time sparsification is a promising path to deploy large language models (LLMs) on resource-constrained devices, yet existing training-free methods typically estimate feedforward network (FFN) neuron importance from the input prompt...
arxivpapers