Informational Frustration in Neural Manifolds: Shannon Bottlenecks and the Limits of Learnability
arXiv:2606.30512v1 Announce Type: cross Abstract: Why overparameterised deep networks generalise so remarkably well remains one of the most stubborn open questions in machine learning theory. Classical frameworks like VC dimension and Rademacher complexity predict catastrophic overfitting in modern...
A New Lens on Deep Learning’s Central Mystery
A recent preprint on arXiv (2606.30512v1) tackles the enduring puzzle of why overparameterised deep networks generalise so effectively, despite classical statistical learning theory predicting they should catastrophically overfit. The authors introduce the concept of “informational frustration” within neural manifolds—the idea that the geometric structure of learned representations in high-dimensional space is constrained by Shannon information bottlenecks. This reframes generalisation not as a miracle of optimisation, but as a necessary consequence of information-theoretic limits on what a network can encode.
What the Research Proposes
The paper argues that as networks grow in width and depth, the neural manifold—the low-dimensional subspace where data representations reside—becomes increasingly “frustrated.” This frustration arises because the manifold’s capacity to store information is bounded by the Shannon capacity of the network’s weights and activations. When the model has more parameters than needed to fit the training data, the bottleneck prevents it from memorising noise or spurious correlations. Instead, the network is forced to compress inputs into a simpler, more generalisable representation. This aligns with recent work on the “lottery ticket hypothesis” and the “information bottleneck” theory of deep learning, but offers a geometric formalism that unifies them.
Why This Matters for the Field
If validated, this framework could resolve a major theoretical rift. Classical measures like VC dimension and Rademacher complexity treat model capacity as a function of parameter count alone, which fails to explain why a 100-million-parameter transformer generalises better than a 10-million-parameter decision tree on natural language. The informational frustration perspective suggests that effective capacity is not just about parameter count, but about the interaction between parameter count, data manifold geometry, and the Shannon bottleneck imposed by finite precision and finite sample sizes. This could lead to new, more predictive complexity measures that account for information flow.
Implications for AI Practitioners
For engineers and researchers, this work offers a practical intuition: overparameterisation is not a bug but a feature. Larger models are not more prone to overfitting because they are large—they are more prone to overfitting if the data manifold is too simple or the information bottleneck is too loose. In practice, this means:
- Scaling laws may be more nuanced: Simply adding parameters may improve generalisation only if the data manifold is sufficiently complex to “frustrate” memorisation.
- Regularisation strategies can be reinterpreted: Techniques like dropout, weight decay, and early stopping may work by tightening the Shannon bottleneck, forcing the manifold into a more frustrated state.
- Architecture design should consider manifold geometry: Models that encourage low-rank or sparse representations (e.g., transformers with attention heads) may naturally induce informational frustration, explaining their success.
Key Takeaways
- The paper proposes that generalisation in overparameterised networks arises from “informational frustration”—a geometric and information-theoretic constraint that prevents memorisation.
- This challenges classical complexity measures like VC dimension, offering a new lens for understanding why large models generalise.
- For practitioners, it suggests that overparameterisation is beneficial when the data manifold is rich enough to saturate the network’s information bottleneck.
- Future work may yield practical tools to measure and tune informational frustration, enabling more principled model scaling and regularisation.