Research2026-06-19

ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number

arXiv:2606.19805v1 Announce Type: cross Abstract: Transferring the camera motion of a reference video to a freshly generated one lets creators reuse cinematic moves. Yet reference and target often live at incompatible scales -- a sweep across a galaxy versus a nudge across a desk -- and naively...

What Happened

The paper introduces ParaScale, a method for transferring camera motion from a reference video to a newly generated video while preserving the scale of the original scene. The core problem is that naive camera-motion transfer fails when reference and target scenes operate at vastly different scales—a sweeping orbit around a planet cannot simply be copied onto a macro shot of a coin. The researchers formalize this using a "gauge-invariant parallax number," a mathematical construct that decouples camera motion from absolute scene scale, enabling scale-calibrated transfer without requiring explicit 3D reconstruction or depth estimation.

Why It Matters

This work addresses a fundamental blind spot in video generation and cinematography AI. Current models like Stable Video Diffusion or Sora can generate impressive clips, but controlling camera motion precisely—especially across scale disparities—remains brittle. A director might want to replicate the dramatic crane shot from The Lord of the Rings onto a miniature model; existing methods would either distort the motion or produce physically implausible results.

The gauge-invariant parallax number is the key insight. By treating parallax (the apparent shift in object positions due to camera movement) as a scale-invariant quantity, ParaScale can map motion from a galaxy-spanning shot to a tabletop scene without requiring the model to "understand" 3D geometry in a traditional sense. This bypasses the need for expensive depth sensors or multi-view training data, making the technique practical for real-world production pipelines.

Implications for AI Practitioners

For video generation researchers: This provides a principled way to inject camera control into diffusion-based video models. Current approaches often rely on camera pose embeddings that are scale-dependent; ParaScale offers a more robust alternative that could be integrated into existing architectures like VideoLDM or AnimateDiff. For computer graphics engineers: The method reduces the gap between procedural camera control and learned video synthesis. Practitioners working on virtual production tools could use this to automate camera motion transfer between scenes of radically different scales—a common pain point in previsualization and VFX. For content creators and filmmakers: This lowers the barrier to reusing cinematic language. A creator could capture a handheld camera motion on a smartphone, then apply that same motion to a generated scene of a city or a microscopic environment, with the algorithm automatically adjusting for scale. The result is more natural than simple linear interpolation. Limitations to watch for: The paper likely relies on having a clean reference video with unambiguous parallax cues. Noisy or highly stylized footage (e.g., heavy motion blur, abstract animation) may degrade performance. Additionally, the gauge-invariant formulation may struggle with scenes containing extreme perspective distortion or non-rigid motion.

Key Takeaways

ParaScale solves the scale-mismatch problem in camera-motion transfer using a novel gauge-invariant parallax number, avoiding explicit 3D reconstruction.
The technique enables filmmakers and AI practitioners to reuse cinematic camera motions across scenes of vastly different physical scales (e.g., galaxy vs. desk).
For practitioners, this offers a pluggable component for video generation models and virtual production tools, reducing the need for manual camera keyframing.
The approach is limited by reference video quality and may not generalize well to non-rigid or highly stylized content.

Read Original Article on Arxiv CS.AI

arxivpapers