Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Generative AI & LLMs
Published: arXiv: 2508.13142v1
Authors

Zhongang Cai Yubo Wang Qingping Sun Ruisi Wang Chenyang Gu Wanqi Yin Zhiqian Lin Zhitao Yang Chen Wei Xuanke Shi Kewang Deng Xiaoyang Han Zukai Chen Jiaqi Li Xiangyu Fan Hanming Deng Lewei Lu Bo Li Ziwei Liu Quan Wang Dahua Lin Lei Yang

Abstract

Multi-modal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, which are fundamental capabilities to achieving artificial general intelligence. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models stand on the path toward spatial intelligence. First, we propose a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and discuss the challenges in ensuring fair evaluation. We then evaluate state-of-the-art proprietary and open-source models on eight key benchmarks, at a cost exceeding one billion total tokens. Our empirical study reveals that (1) GPT-5 demonstrates unprecedented strength in spatial intelligence, yet (2) still falls short of human performance across a broad spectrum of tasks. Moreover, we (3) identify the more challenging spatial intelligence problems for multi-modal models, and (4) proprietary models do not exhibit a decisive advantage when facing the most difficult problems. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans yet fail even the most advanced multi-modal models.

Paper Summary

Problem
The main problem addressed by this research paper is the lack of spatial intelligence in advanced artificial intelligence (AI) models, particularly in multi-modal large language models (MLLMs). Despite impressive advancements in MLLMs, they often struggle with basic spatial tasks that are trivially easy for humans.
Key Innovation
What's new and unique about this work is the comprehensive evaluation of state-of-the-art proprietary and open-source models on eight key benchmarks designed to assess spatial intelligence. The study also proposes a unified taxonomy of spatial tasks and discusses challenges in ensuring fair evaluation.
Practical Impact
This research has significant practical implications for the development of artificial general intelligence (AGI). By understanding where AI models stand on the path toward spatial intelligence, researchers can focus on improving these capabilities, which are essential for AGI. Additionally, this study highlights the need for more diverse and challenging benchmarks to evaluate spatial intelligence.
Analogy / Intuitive Explanation
Imagine trying to navigate a new city without a map or compass. You might know how to read signs and follow streets, but you'd struggle to understand the layout of the city and find your way around. This is similar to what happens when AI models lack spatial intelligence – they can process text and data, but they struggle to understand and reason about the physical world. In this study, the researchers evaluated GPT-5, a highly advanced AI model, on various spatial tasks. While GPT-5 demonstrated remarkable strength in some areas, it still fell short of human performance across many tasks. The study also identified more challenging spatial intelligence problems for multi-modal models and found that proprietary models did not exhibit a decisive advantage when facing the most difficult problems.
Paper Information
Categories:
cs.CV cs.CL cs.LG cs.MM cs.RO
Published Date:

arXiv ID:

2508.13142v1

Quick Actions