Research2026-05-12

Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models

arXiv:2604.23238v2 Announce Type: replace-cross Abstract: Distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and misappropriate their capabilities. Antidistillation methods aim to address this by poisoning...

Read Original Article on Arxiv CS.AI

arxivpapersreasoning