BeClaude
Research2026-05-12

Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning

Source: Arxiv CS.AI

arXiv:2604.24938v2 Announce Type: replace-cross Abstract: Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has largely treated layer redundancy as an inherent structural property of pretrained networks, emphasizing importance...

arxivpapers