VISTA Score: Verification In Sequential Turn-based Assessment

Explainable & Ethical AI
Published: arXiv: 2510.27052v1
Authors

Ashley Lewis Andrew Perrault Eric Fosler-Lussier Michael White

Abstract

Hallucination--defined here as generating statements unsupported or contradicted by available evidence or conversational context--remains a major obstacle to deploying conversational AI systems in settings that demand factual reliability. Existing metrics either evaluate isolated responses or treat unverifiable content as errors, limiting their use for multi-turn dialogue. We introduce VISTA (Verification In Sequential Turn-based Assessment), a framework for evaluating conversational factuality through claim-level verification and sequential consistency tracking. VISTA decomposes each assistant turn into atomic factual claims, verifies them against trusted sources and dialogue history, and categorizes unverifiable statements (subjective, contradicted, lacking evidence, or abstaining). Across eight large language models and four dialogue factuality benchmarks (AIS, BEGIN, FAITHDIAL, and FADE), VISTA substantially improves hallucination detection over FACTSCORE and LLM-as-Judge baselines. Human evaluation confirms that VISTA's decomposition improves annotator agreement and reveals inconsistencies in existing benchmarks. By modeling factuality as a dynamic property of conversation, VISTA offers a more transparent, human-aligned measure of truthfulness in dialogue systems.

Paper Summary

Problem
The main problem addressed in this research paper is the issue of hallucination in conversational AI systems. Hallucination occurs when a conversational AI system generates statements that are unsupported or contradicted by available evidence or conversational context. This is a major obstacle to deploying conversational AI systems in settings that demand factual reliability.
Key Innovation
The key innovation of this research is the introduction of VISTA (Verification In Sequential Turn-based Assessment), a framework for evaluating conversational factuality through claim-level verification and sequential consistency tracking. VISTA decomposes each assistant turn into atomic factual claims, verifies them against trusted sources and dialogue history, and categorizes unverifiable statements.
Practical Impact
This research has the potential to significantly improve the factual accuracy of conversational AI systems. By providing a more transparent and human-aligned measure of truthfulness in dialogue systems, VISTA can help build trust in conversational AI systems and enable their deployment in settings that demand factual reliability. The VISTA framework can also be used to train conversational AI systems to be more accurate and reliable.
Analogy / Intuitive Explanation
Imagine you're having a conversation with a conversational AI system. You ask it a question, and it responds with an answer. But what if the answer is not supported by any evidence or is contradicted by what you know to be true? This is similar to what happens with hallucination in conversational AI systems. VISTA is like a fact-checker that breaks down the conversation into smaller claims, verifies them against trusted sources, and flags any unverifiable statements. By doing so, VISTA helps to ensure that the conversation is accurate and trustworthy.
Paper Information
Categories:
cs.CL
Published Date:

arXiv ID:

2510.27052v1

Quick Actions