Research2026-06-29

CoIn: Comprehensive 2D-3D Inpainting with Gaussian Splatting Guidance

Originally published byArxiv CS.AI

arXiv:2606.27584v1 Announce Type: cross Abstract: 3D scene inpainting is essential for reconstructing areas corrupted by occlusions or limited viewpoints. While recent methods leverage Gaussian Splatting (GS) for efficient 3D editing, they often depend on precise multi-view segmentation masks and...

This research from arXiv introduces CoIn, a novel framework for 3D scene inpainting that leverages Gaussian Splatting (GS) as a guiding mechanism. The core problem it addresses is a persistent bottleneck in 3D editing: the reliance on precise, multi-view segmentation masks. Traditional methods often require users to meticulously label which parts of a scene in multiple camera angles need to be removed or filled, a labor-intensive process that breaks down with complex occlusions or sparse viewpoints.

CoIn proposes a dual-stream architecture that combines 2D and 3D inpainting. The key innovation is the use of Gaussian Splatting—a representation that models scenes as a collection of 3D Gaussians—to provide a "guidance" signal. Instead of forcing the inpainting to rely solely on pixel-level masks, the GS representation offers a geometric and photometric prior. This allows the system to understand the underlying 3D structure of the region to be inpainted, enabling it to generate coherent textures and geometry even when the 2D segmentation is noisy or incomplete. The result is a more robust and automated pipeline for tasks like removing unwanted objects, filling in occluded backgrounds, or repairing holes in reconstructed scenes.

Why It Matters

The significance here is practical efficiency. Gaussian Splatting has already gained traction for its ability to render high-quality novel views in real-time, but its application to editing has been hindered by manual annotation requirements. CoIn moves the needle from "manual curation" toward "semi-automated repair." For AI practitioners working with 3D reconstruction—whether in robotics, autonomous driving, or virtual production—this reduces the friction of cleaning up captured data.

Consider a typical scenario: a drone captures a building, but a tree branch occludes a window. Current GS editing tools would require the operator to draw a mask on every frame where the branch appears. CoIn’s approach, by using the GS representation as a geometric guide, can infer the missing window structure from the surrounding context, requiring only a rough initial mask or even a text prompt. This directly translates to lower labor costs and faster iteration cycles in 3D content pipelines.

Implications for AI Practitioners

Reduced Dependency on Data Quality: Practitioners can now work with imperfect capture data. The GS guidance mechanism acts as a regularizer, making the inpainting robust to errors in segmentation or camera pose estimation. This is particularly valuable for real-world datasets where clean ground-truth masks are rare.

Architecture Design Insight: The dual-stream approach—where a 2D inpainter (e.g., a diffusion model) is guided by a 3D representation—is a pattern likely to proliferate. It suggests that future editing tools will not treat 2D and 3D as separate domains but as coupled systems where the 3D geometry informs the 2D generation and vice versa.

Computational Trade-offs: While CoIn reduces manual effort, it likely introduces a computational overhead from the GS optimization loop. Practitioners should benchmark whether the time saved in manual annotation outweighs the increased GPU runtime for their specific use case (e.g., batch processing vs. interactive editing).

Key Takeaways

CoIn uses Gaussian Splatting as a geometric guide for 3D inpainting, reducing the need for precise multi-view segmentation masks.
The framework addresses a key bottleneck in 3D editing: the manual labor required to clean up occlusions and reconstruction artifacts.
For AI practitioners, this signals a shift toward more automated, robust 3D repair tools that can handle imperfect input data.
The dual 2D-3D architecture pattern is likely to influence future work in scene editing, blending generative inpainting with explicit 3D priors.

Read Original Article on Arxiv CS.AI

arxivpapers