Research2026-05-06

Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

arXiv:2605.00994v1 Announce Type: cross Abstract: Finetuning can significantly modify the behavior of large language models, including introducing harmful or unsafe behaviors. To study these risks, researchers develop model organisms: models finetuned to exhibit specific known behaviors for...

Read Original Article on Arxiv CS.AI

arxivpapers