Is this chart lying to me? Automating the detection of misleading visualizations

Explainable & Ethical AI
Published: arXiv: 2508.21675v1
Authors

Jonathan Tonglet Jan Zimny Tinne Tuytelaars Iryna Gurevych

Abstract

Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and fine-tuned classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.

Paper Summary

Problem
Misleading visualizations are a significant problem in today's digital age. They can be created intentionally or unintentionally, leading readers to draw inaccurate conclusions. These visualizations can spread misinformation and manipulate public understanding, especially during crises like the COVID-19 pandemic. Both humans and artificial intelligence (AI) models are frequently deceived by these visualizations.
Key Innovation
The researchers introduce two new datasets: Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders, and Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables. These datasets are designed to support the training and evaluation of AI models for detecting misleading visualizations.
Practical Impact
The ability to automatically detect misleading visualizations and identify the specific design rules they violate can help protect readers and reduce the spread of misinformation. This can be achieved by releasing timely warnings to chart designers and readers. The Misviz and Misviz-synth datasets can be used to train and evaluate AI models for this task, making it a significant step towards preventing the spread of misinformation.
Analogy / Intuitive Explanation
Imagine you're trying to find the best restaurant in a city. You look at a chart that shows the top-rated restaurants, but the chart is misleading. It might show only the restaurants in a specific neighborhood, or it might use a scale that makes the ratings seem worse than they actually are. This can lead you to choose a restaurant that's not as good as it seems. The researchers are trying to create AI models that can detect these types of misleading charts and help people make better decisions.
Paper Information
Categories:
cs.CL cs.CV cs.GR
Published Date:

arXiv ID:

2508.21675v1

Quick Actions