Research2026-06-26

Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation

arXiv:2606.26705v1 Announce Type: cross Abstract: Feedforward neural network (NN) expressivity is typically studied by emulating optimal basis-expansion schemes. While powerful, this perspective is incomplete: it primarily captures complexity through regularity, and therefore does not distinguish...

This new paper from arXiv, titled Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation, represents a significant theoretical shift in how we understand why deep neural networks work so well. The core argument is that the standard way of measuring a network’s expressive power—by comparing it to how well it can approximate smooth mathematical functions—is fundamentally incomplete.

What Happened

The authors challenge the dominant "basis-expansion" paradigm. Traditionally, researchers have shown that neural networks can approximate any continuous function (the Universal Approximation Theorem) and that deeper networks are "better" because they can represent smoother, more regular functions with fewer parameters. This paper argues that this perspective misses the point. It primarily captures regularity (how smooth a function is), but fails to distinguish between different types of complexity that matter for learning.

The paper proposes a new framework rooted in complexity theory and algorithmic information. Instead of just asking "how many parameters are needed to fit this function?", the researchers ask: "What is the minimal computational cost of learning this function from data, given the network's architecture?" They derive new rates that characterize this cost, effectively showing that the true power of depth is not just about representation, but about computational efficiency—how quickly and reliably a network can find a good solution during training.

Why It Matters

This is a foundational result that bridges a long-standing gap in deep learning theory. For years, practitioners have known empirically that deeper networks generalize better, but the theory often lagged behind, focusing on shallow networks or overly simplistic measures of complexity (like VC dimension or Rademacher complexity).

The key insight is that depth changes the nature of the learning problem itself. A deep network does not just represent a more complex function; it represents a function that is algorithmically simpler to discover via gradient descent. This provides a rigorous justification for the "depth advantage" that practitioners have observed for a decade. It suggests that the inductive bias of deep architectures is not just about smoothness, but about a preference for functions that are computationally tractable to learn.

Implications for AI Practitioners

While highly theoretical, this work has concrete implications for how we think about model design and training:

Architecture as an Algorithmic Prior: The paper implies that choosing a specific architecture (e.g., a ResNet vs. a Transformer) is not just a matter of capacity, but of encoding a specific "algorithm" for solving the problem. A deeper network is not just a "bigger" network; it is a network that is easier to train on certain types of problems.

Rethinking Overparameterization: The findings support the modern view that overparameterization is not a bug but a feature. The complexity-theoretic rates suggest that larger models can actually be easier to train because they provide more "algorithmic shortcuts" for gradient descent to find a good solution.

A New Lens for Generalization: This work offers a more nuanced explanation for why deep networks generalize. It is not just about memorizing training data; it is about the network's inherent ability to find simple, algorithmic solutions that happen to match the underlying structure of the data.

Key Takeaways

The standard "basis-expansion" view of neural network expressivity is incomplete. It fails to account for the computational complexity of learning, focusing only on representation capacity.
Depth provides an "algorithmic advantage." Deeper networks can represent functions that are computationally cheaper to learn via gradient descent, not just functions that are smoother.
This work provides a rigorous theoretical foundation for the empirical success of deep learning. It explains why overparameterized deep networks are trainable and generalize well, despite classical statistical wisdom suggesting otherwise.
For practitioners, architecture choice is an algorithmic decision. The structure of a network should be viewed as a prior over the learning process itself, not just the function space.

Read Original Article on Arxiv CS.AI

arxivpapers