Research2026-05-14

GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs

arXiv:2508.14302v2 Announce Type: replace-cross Abstract: Inference-time sparsification is a promising path to deploy large language models (LLMs) on resource-constrained devices, yet existing training-free methods typically estimate feedforward network (FFN) neuron importance from the input prompt...

Read Original Article on Arxiv CS.AI

arxivpapers