Maestro: Joint Graph & Config Optimization for Reliable AI Agents

Agentic AI
Published: arXiv: 2509.04642v1
Authors

Wenxiao Wang Priyatham Kattakinda Soheil Feizi

Abstract

Building reliable LLM agents requires decisions at two levels: the graph (which modules exist and how information flows) and the configuration of each node (models, prompts, tools, control knobs). Most existing optimizers tune configurations while holding the graph fixed, leaving structural failure modes unaddressed. We introduce Maestro, a framework-agnostic holistic optimizer for LLM agents that jointly searches over graphs and configurations to maximize agent quality, subject to explicit rollout/token budgets. Beyond numeric metrics, Maestro leverages reflective textual feedback from traces to prioritize edits, improving sample efficiency and targeting specific failure modes. On the IFBench and HotpotQA benchmarks, Maestro consistently surpasses leading prompt optimizers--MIPROv2, GEPA, and GEPA+Merge--by an average of 12%, 4.9%, and 4.86%, respectively; even when restricted to prompt-only optimization, it still leads by 9.65%, 2.37%, and 2.41%. Maestro achieves these results with far fewer rollouts than GEPA. We further show large gains on two applications (interviewer & RAG agents), highlighting that joint graph & configuration search addresses structural failure modes that prompt tuning alone cannot fix.

Paper Summary

Problem
Building reliable AI agents requires decisions at two levels: the graph (which modules exist and how information flows) and the configuration of each node (models, prompts, tools, control knobs). Most existing optimizers tune configurations while holding the graph fixed, leaving structural failure modes unaddressed.
Key Innovation
The paper introduces Maestro, a framework-agnostic holistic optimizer for LLM agents that jointly searches over graphs and configurations to maximize agent quality, subject to explicit rollout/token budgets. This allows Maestro to prioritize edits based on reflective textual feedback from traces, improving sample efficiency and targeting specific failure modes.
Practical Impact
Maestro can be applied in various real-world scenarios where AI agents are used. For example, it can improve the reliability of chatbots or virtual assistants by optimizing their graph structure and configuration simultaneously. This can lead to more accurate and efficient decision-making, reduced errors, and improved user experience.
Analogy / Intuitive Explanation
Imagine building a Lego tower. You need to decide not only which pieces to use (configuration) but also how they are connected (graph). Maestro is like a smart builder that searches for the best combination of pieces and connections to create a stable and effective tower. By optimizing both graph and configuration, Maestro ensures that the AI agent is robust and efficient in its decision-making process.
Paper Information
Categories:
cs.AI cs.CL cs.LG cs.SE
Published Date:

arXiv ID:

2509.04642v1

Quick Actions