Research2026-04-24
Secure LLM Fine-Tuning via Safety-Aware Probing
Source: Arxiv CS.AI
arXiv:2505.16737v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have achieved remarkable success across many applications, but their ability to generate harmful content raises serious safety concerns. Although safety alignment techniques are often applied during pre-training...
arxivpaperssafetyfine-tuning