BeClaude
Research2026-05-12

Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models

Source: Arxiv CS.AI

arXiv:2604.23238v2 Announce Type: replace-cross Abstract: Distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and misappropriate their capabilities. Antidistillation methods aim to address this by poisoning...

arxivpapersreasoning