Generative Augmented Reality: Paradigms, Technologies, and Future Applications

Explainable & Ethical AI
Published: arXiv: 2511.16783v1
Authors

Chen Liang Jiawen Zheng Yufeng Zeng Yi Tan Hengye Lyu Yuhui Zheng Zisu Li Yueting Weng Jiaxin Shi Hanwang Zhang

Abstract

This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conventional AR engine. GAR replaces the conventional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. We formalize the computational correspondence between AR and GAR, survey the technical foundations that make real-time generative augmentation feasible, and outline prospective applications that leverage its unified inference model. We envision GAR as a future AR paradigm that delivers high-fidelity experiences in terms of realism, interactivity, and immersion, while eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.

Paper Summary

Problem
The main problem this paper addresses is the limitations of traditional Augmented Reality (AR) technology. Current AR systems rely on explicit 3D modeling, predefined interaction rules, and deterministic graphics pipelines, which make it difficult to create high-fidelity interactions, such as realistic behaviors of living creatures or complex mechanical dynamics. These limitations restrict the expressive space of AR and make it challenging to achieve truly responsive or realistic interactions.
Key Innovation
The key innovation of this paper is the introduction of Generative Augmented Reality (GAR), a new paradigm that reframes augmentation as a process of world re-synthesis rather than world composition. GAR replaces the traditional AR engine's multi-stage modules with a unified generative backbone, where environmental sensing, virtual content, and interaction signals are jointly encoded as conditioning inputs for continuous video generation. This approach enables the creation of high-fidelity interactions and experiences in real-time.
Practical Impact
The practical impact of GAR is significant, as it has the potential to revolutionize various industries and applications, such as: * Interactive media and entertainment * Industrial guidance and education * Navigation and spatial experience * Embodied creativity and adaptive storytelling GAR can deliver high-fidelity experiences in terms of realism, interactivity, and immersion, while also eliciting new research challenges on technologies, content ecosystems, and the ethical and societal implications.
Analogy / Intuitive Explanation
Imagine a painting that changes and evolves as you interact with it. The colors, shapes, and textures adapt to your movements and actions, creating a unique and dynamic experience. GAR is similar, but instead of a painting, it's the entire visual scene that is re-synthesized in real-time, responding to your actions and interactions. This analogy illustrates the core idea of GAR, where augmentation is achieved not by layering virtual objects, but by regenerating the perceptual world itself under the influence of sensing, intention, and interaction.
Paper Information
Categories:
cs.HC cs.AI cs.CV
Published Date:

arXiv ID:

2511.16783v1

Quick Actions