BeClaude
Research2026-05-06

BaldWhisper: Faster Whisper with Head Shearing and Layer Merging

Source: Arxiv CS.AI

arXiv:2510.08599v2 Announce Type: replace-cross Abstract: Pruning large pre-trained transformers in a data-scarce scenario is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whisper by 40 and retrains on 21,000 hours of speech,...

arxivpapers