Research2026-05-12

Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

arXiv:2604.24938v2 Announce Type: replace-cross Abstract: Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has largely treated layer redundancy as an inherent structural property of pretrained networks, emphasizing importance...

Read Original Article on Arxiv CS.AI

arxivpapers