Research2026-05-07
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Source: Arxiv CS.AI
arXiv:2507.07982v2 Announce Type: replace-cross Abstract: Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned...
arxivpapersimage-generation