Back to News
Research2026-04-17
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
Source: Arxiv CS.AI
arXiv:2604.13175v1 Announce Type: cross Abstract: Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of...
arxivpapersrl