Research2026-05-06
Model Spec Midtraining: Improving How Alignment Training Generalizes
Source: Arxiv CS.AI
arXiv:2605.02087v1 Announce Type: new Abstract: Some frontier AI developers aim to align language models to a Model Spec or Constitution that describes the intended model behavior. However, standard alignment fine-tuning -- training on demonstrations of spec-aligned behavior -- can produce shallow...
arxivpapers