BeClaude
Back to News
Policy2026-04-17

Soft $Q(\lambda)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

Source: Arxiv CS.AI

arXiv:2604.13780v1 Announce Type: cross Abstract: Soft Q-learning has emerged as a versatile model-free method for entropy-regularised reinforcement learning, optimising for returns augmented with a penalty on the divergence from a reference policy. Despite its success, the multi-step extensions of...

arxivpapersrl