Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment

Computer Vision & MultiModal AI
Published: arXiv: 2602.24277v1
Authors

Dake Zhang Mark D. Smucker Charles L. A. Clarke

Abstract

Many readers today struggle to assess the trustworthiness of online news because reliable reporting coexists with misinformation. The TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track provided a venue for researchers to develop and evaluate assistive RAG systems that support readers' news trustworthiness assessment by producing reader-oriented, well-attributed reports. As the organizers of the DRAGUN track, we describe the resources that we have newly developed to allow for the reuse of the track's tasks. The track had two tasks: (Task 1) Question Generation, producing 10 ranked investigative questions; and (Task 2, the main task) Report Generation, producing a 250-word report grounded in the MS MARCO V2.1 Segmented Corpus. As part of the track's evaluation, we had TREC assessors create importance-weighted rubrics of questions with expected short answers for 30 different news articles. These rubrics represent the information that assessors believe is important for readers to assess an article's trustworthiness. The assessors then used their rubrics to manually judge the participating teams' submitted runs. To make these tasks and their rubrics reusable, we have created an automated process to judge runs not part of the original assessing. We show that our AutoJudge ranks existing runs well compared to the TREC human-assessed evaluation (Kendall's $τ= 0.678$ for Task 1 and $τ= 0.872$ for Task 2). These resources enable both the evaluation of RAG systems for assistive news trustworthiness assessment and, with the human evaluation as a benchmark, research on improving automated RAG evaluation.

Paper Summary

Problem
Many readers struggle to assess the trustworthiness of online news due to the coexistence of reliable and misleading reporting. This can lead to the spread of misinformation and have serious social implications.
Key Innovation
Researchers have developed a new approach to evaluate assistive systems that help readers assess news trustworthiness. They created a reusable resource, called the TREC 2025 DRAGUN Track, which includes importance-weighted rubrics of questions with expected short answers for 30 news articles. This allows for the evaluation of Retrieval-Augmented Generation (RAG) systems in a more efficient and accurate way.
Practical Impact
The TREC 2025 DRAGUN Track has the potential to improve the evaluation of RAG systems, which can help readers make more informed decisions when consuming online news. By providing a reusable resource, researchers can develop and evaluate new systems for lateral-reading-style assistance, which can help readers assess news trustworthiness more effectively. This can lead to a reduction in the spread of misinformation and a more informed public.
Analogy / Intuitive Explanation
Imagine you're trying to fact-check a news article, but you're not sure what to look for. A good assistant would help you identify the most important questions to ask and provide you with relevant information to answer them. The TREC 2025 DRAGUN Track is like a tool that helps researchers develop and evaluate these types of assistants, so they can provide more accurate and trustworthy information to readers.
Paper Information
Categories:
cs.IR cs.AI
Published Date:

arXiv ID:

2602.24277v1

Quick Actions