IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion

Computer Vision & MultiModal AI
Published: arXiv: 2508.13153v1
Authors

Wenhao Hu Zesheng Li Haonan Zhou Liu Liu Xuexiang Wen Zhizhong Su Xi Li Gaoang Wang

Abstract

Reconstructing complete and interactive 3D scenes remains a fundamental challenge in computer vision and robotics, particularly due to persistent object occlusions and limited sensor coverage. Multiview observations from a single scene scan often fail to capture the full structural details. Existing approaches typically rely on multi stage pipelines, such as segmentation, background completion, and inpainting or require per-object dense scanning, both of which are error-prone, and not easily scalable. We propose IGFuse, a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans, where natural object rearrangement between captures reveal previously occluded regions. Our method constructs segmentation aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans. To handle spatial misalignments, we introduce a pseudo-intermediate scene state for unified alignment, alongside collaborative co-pruning strategies to refine geometry. IGFuse enables high fidelity rendering and object level scene manipulation without dense observations or complex pipelines. Extensive experiments validate the framework's strong generalization to novel scene configurations, demonstrating its effectiveness for real world 3D reconstruction and real-to-simulation transfer. Our project page is available online.

Paper Summary

Problem
Reconstructing complete and interactive 3D scenes from partially observed environments is a fundamental challenge in computer vision and robotics. Current approaches often rely on multi-stage pipelines or require dense scanning, which can be error-prone and not easily scalable.
Key Innovation
IGFuse is a novel framework that reconstructs interactive Gaussian scenes by fusing observations from multiple scans. This approach leverages natural object rearrangements between captures to reveal previously occluded regions and refine geometry.
Practical Impact
IGFuse enables high-fidelity rendering and object-level scene manipulation without dense observations or complex pipelines. Its effectiveness for real-world 3D reconstruction and real-to-simulation transfer makes it a valuable tool for various applications, such as robotics, gaming, and architecture.
Analogy / Intuitive Explanation
Imagine taking multiple photos of the same room from different angles. Each photo captures some parts of the scene, but not everything. IGFuse is like combining those photos to create a single, detailed picture of the entire scene, while also correcting for any gaps or misalignments between them. This allows you to see the whole scene in high quality and even manipulate individual objects within it. Note: The analogy is not perfect, as IGFuse works with 3D Gaussian fields rather than 2D photos, but it conveys the idea of combining multiple partial views to create a complete and accurate representation of the scene.
Paper Information
Categories:
cs.CV
Published Date:

arXiv ID:

2508.13153v1

Quick Actions