BeClaude
Research2026-04-28

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Source: Arxiv CS.AI

arXiv:2604.23056v1 Announce Type: cross Abstract: We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates...

arxivpapersrl