Research2026-05-12
Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims
Source: Arxiv CS.AI
arXiv:2605.10575v1 Announce Type: cross Abstract: Safe fine-tuning defenses are often endorsed on the basis of a held-out gap reduction, but the same reduction can come from sampling noise, subject artifacts, capability loss, or a mechanism that does not transfer. We introduce Acceptance Cards: an...
arxivpapersfine-tuning