Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning

Agentic AI
Published: arXiv: 2509.16136v1
Authors

Changwei Yao Xinzi Liu Chen Li Marios Savvides

Abstract

Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.

Paper Summary

Problem
Designing effective reward functions for reinforcement learning (RL) is a major challenge. It requires human expertise and iterative refinement, making it time-consuming and prone to overfitting. Current approaches using Large Language Models (LLMs) have limitations, such as hallucinations, reliance on human feedback, and challenges with handling complex tasks.
Key Innovation
The researchers introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention.
Practical Impact
RE-GoT has the potential to revolutionize the field of RL by providing a scalable and effective solution for autonomous reward evolution. It can be applied to various robotic manipulation tasks, such as picking and placing objects, and can be used to improve the performance and generalization of RL systems in complex environments. The approach can also be used to reduce the reliance on human supervision, making it a more efficient and effective way to design reward functions.
Analogy / Intuitive Explanation
Imagine you're trying to teach a child to tie their shoes. A traditional approach would be to show them step-by-step instructions and provide feedback on their progress. However, with RE-GoT, the child (or in this case, the RL agent) can learn to break down the task into smaller, manageable steps and receive feedback in the form of visual demonstrations. This approach allows the child (or agent) to learn more efficiently and effectively, and to generalize their skills to new situations.
Paper Information
Categories:
cs.RO
Published Date:

arXiv ID:

2509.16136v1

Quick Actions