Research2026-04-28

Protecting the Trace: A Principled Black-Box Approach Against Distillation Attacks

arXiv:2604.23238v1 Announce Type: cross Abstract: Frontier models push the boundaries of what is learnable at extreme computational costs, yet distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and...

Read Original Article on Arxiv CS.AI

arxivpapers