LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems

Generative AI & LLMs
Published: arXiv: 2601.16890v1
Authors

João A. Leite Olesya Razuvayevskaya Kalina Bontcheva Carolina Scarton

Abstract

Automated fact-checking (AFC) systems are susceptible to adversarial attacks, enabling false claims to evade detection. Existing adversarial frameworks typically rely on injecting noise or altering semantics, yet no existing framework exploits the adversarial potential of persuasion techniques, which are widely used in disinformation campaigns to manipulate audiences. In this paper, we introduce a novel class of persuasive adversarial attacks on AFCs by employing a generative LLM to rephrase claims using persuasion techniques. Considering 15 techniques grouped into 6 categories, we study the effects of persuasion on both claim verification and evidence retrieval using a decoupled evaluation strategy. Experiments on the FEVER and FEVEROUS benchmarks show that persuasion attacks can substantially degrade both verification performance and evidence retrieval. Our analysis identifies persuasion techniques as a potent class of adversarial attacks, highlighting the need for more robust AFC systems.

Paper Summary

Problem
Disinformation campaigns are becoming increasingly sophisticated, using persuasion techniques to manipulate audiences and evade detection by fact-checking systems. These systems are crucial in countering disinformation, but they are not immune to adversarial attacks. The current methods of adversarial attacks against fact-checking systems focus on surface-level perturbations such as typos or character noise, leaving a gap in addressing the more insidious threat of persuasion techniques.
Key Innovation
This research introduces a novel class of persuasive adversarial attacks on fact-checking systems, which employ a generative Large Language Model (LLM) to rephrase claims using persuasion techniques. This approach is the first to systematically weaponise persuasion techniques against fact-checking systems, making it a potent class of adversarial attacks.
Practical Impact
The findings of this research have significant practical implications for the development of fact-checking systems. The results show that fact-checking pipelines fail to disentangle persuasive rhetoric from factual content, making them vulnerable to persuasion attacks. This highlights the need for more robust fact-checking systems that can effectively counter manipulation and deception. The research also motivates future work on making fact-checking systems more robust to persuasion attacks, particularly in the context of manipulative wording.
Analogy / Intuitive Explanation
Imagine a fact-checking system as a referee in a debate. The referee's job is to verify the accuracy of the claims made by the debaters. However, if the debaters use persuasive techniques such as emotional appeals or loaded language to sway the audience, the referee may struggle to distinguish between fact and fiction. The persuasive adversarial attacks introduced in this research are like a sophisticated debating tactic that exploits the weaknesses of the referee, making it more challenging for the fact-checking system to accurately verify the claims.
Paper Information
Categories:
cs.CL cs.AI cs.LG
Published Date:

arXiv ID:

2601.16890v1

Quick Actions