Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts

Explainable & Ethical AI
Published: arXiv: 2510.21557v1
Authors

Hongwei Zhang Ji Lu Shiqing Jiang Chenxiang Zhu Li Xie Chen Zhong Haoran Chen Yurui Zhu Yongsheng Du Yanqin Gao Lingjun Huang Baoli Wang Fang Tan Peng Zou

Abstract

Long-horizon reasoning in LLM-based agents often fails not from generative weakness but from insufficient verification of intermediate reasoning. Co-Sight addresses this challenge by turning reasoning into a falsifiable and auditable process through two complementary mechanisms: Conflict-Aware Meta-Verification (CAMV) and Trustworthy Reasoning with Structured Facts (TRSF). CAMV reformulates verification as conflict identification and targeted falsification, allocating computation only to disagreement hotspots among expert agents rather than to full reasoning chains. This bounds verification cost to the number of inconsistencies and improves efficiency and reliability. TRSF continuously organizes, validates, and synchronizes evidence across agents through a structured facts module. By maintaining verified, traceable, and auditable knowledge, it ensures that all reasoning is grounded in consistent, source-verified information and supports transparent verification throughout the reasoning process. Together, TRSF and CAMV form a closed verification loop, where TRSF supplies structured facts and CAMV selectively falsifies or reinforces them, yielding transparent and trustworthy reasoning. Empirically, Co-Sight achieves state-of-the-art accuracy on GAIA (84.4%) and Humanity's Last Exam (35.5%), and strong results on Chinese-SimpleQA (93.8%). Ablation studies confirm that the synergy between structured factual grounding and conflict-aware verification drives these improvements. Co-Sight thus offers a scalable paradigm for reliable long-horizon reasoning in LLM-based agents. Code is available at https://github.com/ZTE-AICloud/Co-Sight/tree/cosight2.0_benchmarks.

Paper Summary

Problem
Large Language Model (LLM)-based agents have made significant progress in solving complex tasks, but they still struggle with reliable long-horizon reasoning. This is because their verification process is inefficient and scales with the entire reasoning chain, rather than focusing on key decision-making steps. As a result, these agents often produce unreliable outputs, which can have serious consequences in applications like healthcare, finance, and education.
Key Innovation
The Co-Sight framework addresses this problem by introducing two complementary mechanisms: Conflict-Aware Meta-Verification (CAMV) and Trustworthy Reasoning with Structured Facts (TRSF). CAMV reformulates verification as conflict identification and targeted falsification, focusing on points of disagreement rather than the entire reasoning chain. TRSF continuously organizes, validates, and synchronizes evidence across agents through a structured facts module, ensuring that all reasoning is grounded in consistent, source-verified information.
Practical Impact
Co-Sight has the potential to significantly improve the reliability and trustworthiness of LLM-based agents. By focusing on key decision-making steps and ensuring that reasoning is grounded in verified information, Co-Sight can help prevent errors and biases in complex tasks. This can have far-reaching implications for applications like healthcare, finance, and education, where reliable decision-making is critical.
Analogy / Intuitive Explanation
Think of Co-Sight as a quality control process for complex reasoning. Imagine a team of experts working together to solve a complex problem, but each expert has different opinions and assumptions. Co-Sight is like a referee who identifies the points of disagreement and ensures that each expert's reasoning is grounded in verified information. By doing so, Co-Sight helps to prevent errors and biases, and ensures that the final output is reliable and trustworthy.
Paper Information
Categories:
cs.AI
Published Date:

arXiv ID:

2510.21557v1

Quick Actions