Research2026-05-12
Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models
Source: Arxiv CS.AI
arXiv:2604.23238v2 Announce Type: replace-cross Abstract: Distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and misappropriate their capabilities. Antidistillation methods aim to address this by poisoning...
arxivpapersreasoning