Research2026-05-06
Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives
Source: Arxiv CS.AI
arXiv:2605.00994v1 Announce Type: cross Abstract: Finetuning can significantly modify the behavior of large language models, including introducing harmful or unsafe behaviors. To study these risks, researchers develop model organisms: models finetuned to exhibit specific known behaviors for...
arxivpapers