Research2026-05-08

Frictional Q-Learning

arXiv:2509.19771v4 Announce Type: replace-cross Abstract: Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From...

Read Original Article on Arxiv CS.AI

arxivpapers