How AI settled the complexity of the oldest SGD algorithm
arXiv:2606.29593v1 Announce Type: cross Abstract: In 1937, Stefan Kaczmarz proposed a simple algorithm for solving systems of linear equations. This algorithm turned out to be the earliest known example of stochastic gradient descent, a ubiquitous computing paradigm that drives the training of...
In a remarkable piece of intellectual archaeology, researchers have formally proven the computational complexity of the 1937 Kaczmarz algorithm, confirming it as the earliest known instance of stochastic gradient descent (SGD). The paper, posted on arXiv, settles a long-standing theoretical question about an algorithm that predates the modern AI era by decades. Stefan Kaczmarz originally devised this iterative method for solving systems of linear equations, but it has since been recognized as a foundational precursor to the optimization techniques that now power deep learning.
What Happened
The research provides a rigorous complexity analysis of the randomized Kaczmarz algorithm, demonstrating its convergence properties and establishing its place in the lineage of SGD. While practitioners have long used variants of this method—particularly in computerized tomography and signal processing—its theoretical underpinnings as a stochastic optimization algorithm were not fully formalized until now. The analysis shows that the algorithm’s per-iteration complexity and convergence rate align with modern SGD theory, effectively retrofitting a 1930s idea into the contemporary machine learning framework.
Why It Matters
This finding is significant for several reasons. First, it closes a historical gap in optimization theory, showing that the core mechanism of SGD—randomly selecting a single equation (or data point) to update parameters—was operational decades before the term “stochastic gradient descent” was coined. Second, it validates the intuition that simple, iterative methods can be surprisingly powerful. The Kaczmarz algorithm’s resilience to noise and its ability to converge without full-batch computations prefigured the efficiency gains that make modern large-scale training feasible.
For the AI community, this is a reminder that many “new” ideas in machine learning have deep roots. The algorithm’s success in medical imaging (e.g., CT scan reconstruction) already demonstrated its practical utility, but the complexity proof now gives theoreticians a clean model to study. It also reinforces the importance of randomized algorithms in high-dimensional spaces—a lesson that remains central to training large language models and neural networks today.
Implications for AI Practitioners
While this result is primarily theoretical, it has practical resonance. Understanding that SGD’s lineage includes a deterministic-looking solver for linear systems can help practitioners appreciate why stochastic methods work so well on overdetermined problems. The Kaczmarz algorithm’s structure—cycling through rows of a matrix—maps directly to mini-batch training in deep learning, where each batch is a random subset of the data. The complexity proof offers guarantees that such randomized updates will converge under specific conditions, which can inform learning rate scheduling and batch size selection.
Moreover, this research highlights the value of revisiting classical algorithms with modern tools. As AI scales, the efficiency of SGD and its variants becomes paramount. Knowing that a 1937 algorithm already captured the essential trade-off between computation per step and convergence speed may inspire new hybrid approaches that blend classical numerical methods with deep learning.
Key Takeaways
- The 1937 Kaczmarz algorithm has been formally proven to be the earliest known instance of stochastic gradient descent, with a complete complexity analysis now available.
- This result bridges a historical gap in optimization theory, showing that SGD’s core mechanism predates modern AI by over 50 years.
- For practitioners, the proof reinforces the theoretical validity of randomized update schemes, which underpin efficient training in large-scale machine learning.
- The work encourages a deeper appreciation of classical algorithms, suggesting that revisiting old methods can yield insights for contemporary AI challenges.