Research2026-05-06
LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
Source: Arxiv CS.AI
arXiv:2605.01058v1 Announce Type: cross Abstract: Layer-aligned distillation and convergence-based early exit represent two predominant computational efficiency paradigms for transformer inference; yet we establish that they exhibit systematic incompatibility under standard deployment conditions...
arxivpapers