Policy2026-04-17

Soft $Q(\lambda)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

arXiv:2604.13780v1 Announce Type: cross Abstract: Soft Q-learning has emerged as a versatile model-free method for entropy-regularised reinforcement learning, optimising for returns augmented with a penalty on the divergence from a reference policy. Despite its success, the multi-step extensions of...

Read Original Article on Arxiv CS.AI

arxivpapersrl