Research2026-05-12

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

arXiv:2605.10335v1 Announce Type: cross Abstract: Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and second moments, incurring substantial memory...

Read Original Article on Arxiv CS.AI

arxivpapers