Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery

Computer Vision & MultiModal AI
Published: arXiv: 2511.11470v1
Authors

Yijie Kang Xinliang Wang Zhenyu Wu Yifeng Shi Hailong Zhu

Abstract

Recent advances in generative modeling have substantially enhanced 3D urban generation, enabling applications in digital twins, virtual cities, and large-scale simulations. However, existing methods face two key challenges: (1) the need for large-scale 3D city assets for supervised training, which are difficult and costly to obtain, and (2) reliance on semantic or height maps, which are used exclusively for generating buildings in virtual worlds and lack connection to real-world appearance, limiting the realism and generalizability of generated cities. To address these limitations, we propose Sat2RealCity, a geometry-aware and appearance-controllable framework for 3D urban generation from real-world satellite imagery. Unlike previous city-level generation methods, Sat2RealCity builds generation upon individual building entities, enabling the use of rich priors and pretrained knowledge from 3D object generation while substantially reducing dependence on large-scale 3D city assets. Specifically, (1) we introduce the OSM-based spatial priors strategy to achieve interpretable geometric generation from spatial topology to building instances; (2) we design an appearance-guided controllable modeling mechanism for fine-grained appearance realism and style control; and (3) we construct an MLLM-powered semantic-guided generation pipeline, bridging semantic interpretation and geometric reconstruction. Extensive quantitative and qualitative experiments demonstrate that Sat2RealCity significantly surpasses existing baselines in structural consistency and appearance realism, establishing a strong foundation for real-world aligned 3D urban content creation. The code will be released soon.

Paper Summary

Problem
The main problem addressed by this research paper is the limitation of existing 3D urban generation methods. These methods rely on large-scale 3D city assets for supervised training, which are difficult and costly to obtain. Additionally, they often use simplified inputs such as semantic maps or height maps, which fail to capture the fine-grained appearance, material, and structural details of real-world cities. This leads to generated content that lacks realism and generalizability when deployed in real-world contexts.
Key Innovation
The key innovation of this work is the Sat2RealCity framework, a geometry-aware and appearance-controllable approach for 3D urban generation directly from real-world satellite imagery. Unlike previous city-level generation methods, Sat2RealCity builds generation upon individual building entities, enabling the use of rich priors and pre-trained knowledge from 3D object generation while substantially reducing dependence on large-scale 3D city assets.
Practical Impact
The Sat2RealCity framework has the potential to revolutionize the field of 3D urban content creation. It can be applied in various real-world scenarios, such as urban planning, autonomous driving, and geographic visualization. The framework's ability to generate high-fidelity 3D city models with detailed geometry and appearance can help create more realistic and immersive digital twins, virtual cities, and large-scale simulation environments. This can lead to more accurate simulations, better decision-making, and improved public safety.
Analogy / Intuitive Explanation
Imagine you're trying to build a Lego city from scratch. Traditional methods would give you a set of pre-made buildings and roads, but it would be hard to get the details right, like the texture of the buildings or the shape of the roads. Sat2RealCity is like having a special tool that can take a satellite image of a real city and use it to build a highly detailed and realistic Lego city, with all the correct textures and shapes. This tool can also be customized to create different styles or themes, making it a powerful tool for urban planning and visualization.
Paper Information
Categories:
cs.CV
Published Date:

arXiv ID:

2511.11470v1

Quick Actions