BeClaude
Research2026-04-30

Test-Time Safety Alignment

Source: Arxiv CS.AI

arXiv:2604.26167v1 Announce Type: cross Abstract: Recent work has shown that a model's input word embeddings can serve as effective control variables for steering its behavior toward outputs that satisfy desired properties. However, this has only been demonstrated for pretrained text-completion...

arxivpaperssafety