Research2026-04-28
Evaluating whether AI models would sabotage AI safety research
Source: Arxiv CS.AI
arXiv:2604.24618v1 Announce Type: new Abstract: We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when deployed as AI research agents within a frontier AI company. We apply two complementary evaluations to four Claude models (Mythos Preview, Opus 4.7...
arxivpaperssafety