Tree-Guided Diffusion Planner

Agentic AI
Published: arXiv: 2508.21800v1
Authors

Hyeonseong Jeon Cheolhong Min Jaesik Park

Abstract

Planning with pretrained diffusion models has emerged as a promising approach for solving test-time guided control problems. However, standard gradient guidance typically performs optimally under convex and differentiable reward landscapes, showing substantially reduced effectiveness in real-world scenarios involving non-convex objectives, non-differentiable constraints, and multi-reward structures. Furthermore, recent supervised planning approaches require task-specific training or value estimators, which limits test-time flexibility and zero-shot generalization. We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. We frame test-time planning as a tree search problem using a bi-level sampling process: (1) diverse parent trajectories are produced via training-free particle guidance to encourage broad exploration, and (2) sub-trajectories are refined through fast conditional denoising guided by task objectives. TDP addresses the limitations of gradient guidance by exploring diverse trajectory regions and harnessing gradient information across this expanded solution space using only pretrained models and test-time reward signals. We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and AntMaze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks. The project page can be found at: tree-diffusion-planner.github.io.

Paper Summary

Problem
The main problem addressed in this research paper is the limitation of current test-time guided planning approaches, which often struggle with non-convex objectives, non-differentiable constraints, and multi-reward structures. These approaches typically rely on gradient guidance, which can lead to local optima and reduced effectiveness in real-world scenarios.
Key Innovation
The key innovation proposed in this paper is the Tree-Guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. TDP frames test-time planning as a tree search problem using a bi-level sampling process, which produces diverse parent trajectories and refines them through fast conditional denoising guided by task objectives.
Practical Impact
The practical impact of this research is significant, as it enables the development of more effective test-time planning approaches that can handle complex real-world scenarios. TDP can be applied in various domains, such as robotics, autonomous systems, and decision-making under uncertainty. The framework's ability to balance exploration and exploitation can lead to improved performance in tasks that demand out-of-distribution generalization.
Analogy / Intuitive Explanation
Imagine navigating a maze with multiple goals. A traditional gradient-guided approach would try to find the shortest path to the closest goal, but might get stuck in a local optimum. TDP, on the other hand, would generate a tree of possible paths, exploring different regions of the maze and refining the most promising ones through guided denoising. This approach increases the chances of finding the optimal solution, even in complex and non-convex scenarios.
Paper Information
Categories:
cs.AI cs.RO
Published Date:

arXiv ID:

2508.21800v1

Quick Actions