BeClaude
Research2026-04-24

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Source: Arxiv CS.AI

arXiv:2503.17239v3 Announce Type: replace-cross Abstract: Fine-tuning large language models (LLMs) is a common practice to adapt generalist models to specialized domains. However, recent studies show that fine-tuning can erode safety alignment, causing LLMs to respond to harmful or unethical...

arxivpaperssafetyfine-tuning