Research2026-05-12
Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs
Source: Arxiv CS.AI
arXiv:2605.09922v1 Announce Type: cross Abstract: While recent self-training approaches have reduced reliance on human-labeled data for aligning LLMs, they still face critical limitations: (i) sensitivity to synthetic data quality, leading to instability and bias amplification in iterative...
arxivpapersfine-tuning