Research2026-04-28
Protecting the Trace: A Principled Black-Box Approach Against Distillation Attacks
Source: Arxiv CS.AI
arXiv:2604.23238v1 Announce Type: cross Abstract: Frontier models push the boundaries of what is learnable at extreme computational costs, yet distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and...
arxivpapers