Research2026-04-17

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

arXiv:2604.13175v1 Announce Type: cross Abstract: Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of...

Read Original Article on Arxiv CS.AI

arxivpapersrl