Research2026-04-24
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging
Source: Arxiv CS.AI
arXiv:2503.17239v3 Announce Type: replace-cross Abstract: Fine-tuning large language models (LLMs) is a common practice to adapt generalist models to specialized domains. However, recent studies show that fine-tuning can erode safety alignment, causing LLMs to respond to harmful or unethical...
arxivpaperssafetyfine-tuning