Research2026-05-07

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

arXiv:2507.07982v2 Announce Type: replace-cross Abstract: Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned...

Read Original Article on Arxiv CS.AI

arxivpapersimage-generation