Research2026-04-28

Evaluating whether AI models would sabotage AI safety research

arXiv:2604.24618v1 Announce Type: new Abstract: We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when deployed as AI research agents within a frontier AI company. We apply two complementary evaluations to four Claude models (Mythos Preview, Opus 4.7...

Read Original Article on Arxiv CS.AI

arxivpaperssafety