Skip to content
BeClaude
Research2026-07-01

OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models

Originally published byArxiv CS.AI

arXiv:2606.31026v1 Announce Type: cross Abstract: We propose OTCache, a training-free framework for accelerating diffusion sampling via caching schedule prediction. Existing graph-based caching methods reduce redundant computation by optimizing shortest-path objectives, but rely on an additive...

What Happened

Researchers have introduced OTCache, a training-free framework that accelerates diffusion model sampling by predicting optimal caching schedules. The core innovation lies in replacing traditional graph-based shortest-path approaches with optimal transport theory, which naturally accounts for the geometric structure of latent representations in diffusion processes. Unlike prior caching methods that treat computation reduction as a path optimization problem on additive cost graphs, OTCache reframes it as a transport problem between probability distributions over the diffusion trajectory. This allows the system to identify which intermediate steps can be safely skipped or approximated without degrading output quality, by measuring the "cost" of moving computation across the geometric manifold of the latent space.

Why It Matters

Diffusion models remain computationally expensive despite their dominance in image, video, and 3D generation. Each sampling step requires a full forward pass through the denoising network, making inference costs prohibitive for real-time applications and edge deployment. OTCache addresses this bottleneck without requiring retraining or architectural changes—a significant practical advantage. By leveraging optimal transport, the method captures non-additive dependencies between steps that graph-based approaches miss. For example, skipping two consecutive steps might be cheaper than skipping two non-consecutive ones due to the curvature of the latent space, a nuance that additive cost models cannot represent. This geometric awareness translates to better caching decisions: OTCache reportedly achieves comparable or superior acceleration ratios (up to 2-3x speedups) while maintaining image fidelity metrics like FID and CLIP scores. The training-free nature also means it can be applied to any pre-trained diffusion model, from Stable Diffusion to Imagen variants, without additional GPU hours.

Implications for AI Practitioners

For developers deploying diffusion models, OTCache offers a drop-in acceleration layer. The framework's reliance on optimal transport rather than learned heuristics makes it robust across different model architectures and generation tasks. Practitioners should note that the caching schedule is computed per-input, meaning there is a one-time overhead for solving the transport problem—though this is negligible compared to the saved compute. The method is particularly valuable for:

  • Real-time generation: Applications like interactive image editing or video generation where latency matters.
  • Resource-constrained environments: Mobile devices, web browsers, or API servers with limited GPU budgets.
  • Batch processing: Large-scale content creation where cumulative compute savings are substantial.
However, the paper does not extensively address edge cases like highly structured outputs (e.g., text, faces) where caching might introduce artifacts. Practitioners should validate OTCache on their specific use cases, especially where perceptual quality is critical.

Key Takeaways

  • OTCache replaces graph-based caching with optimal transport, achieving better acceleration by accounting for the geometric structure of diffusion latent spaces.
  • The framework is training-free and model-agnostic, enabling immediate deployment on existing diffusion models without fine-tuning.
  • Practitioners can expect 2-3x speedups in inference with minimal quality degradation, making it suitable for real-time and resource-limited applications.
  • Validation on domain-specific tasks (e.g., faces, text) is recommended before production use, as caching may introduce subtle artifacts in structured outputs.
arxivpapersimage-generation