Research2026-05-12
PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent
Source: Arxiv CS.AI
arXiv:2605.10335v1 Announce Type: cross Abstract: Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and second moments, incurring substantial memory...
arxivpapers