Semantic-aware Adversarial Fine-tuning for CLIP

AI in healthcare
Published: arXiv: 2602.12461v1
Authors

Jiacheng Zhang Jinhao Li Hanxun Huang Sarah M. Erfani Benjamin I. P. Rubinstein Feng Liu

Abstract

Recent studies have shown that CLIP model's adversarial robustness in zero-shot classification tasks can be enhanced by adversarially fine-tuning its image encoder with adversarial examples (AEs), which are generated by minimizing the cosine similarity between images and a hand-crafted template (e.g., ''A photo of a {label}''). However, it has been shown that the cosine similarity between a single image and a single hand-crafted template is insufficient to measure the similarity for image-text pairs. Building on this, in this paper, we find that the AEs generated using cosine similarity may fail to fool CLIP when the similarity metric is replaced with semantically enriched alternatives, making the image encoder fine-tuned with these AEs less robust. To overcome this issue, we first propose a semantic-ensemble attack to generate semantic-aware AEs by minimizing the average similarity between the original image and an ensemble of refined textual descriptions. These descriptions are initially generated by a foundation model to capture core semantic features beyond hand-crafted templates and are then refined to reduce hallucinations. To this end, we propose Semantic-aware Adversarial Fine-Tuning (SAFT), which fine-tunes CLIP's image encoder with semantic-aware AEs. Extensive experiments show that SAFT outperforms current methods, achieving substantial improvements in zero-shot adversarial robustness across 16 datasets. Our code is available at: https://github.com/tmlr-group/SAFT.

Paper Summary

Problem
The main problem addressed in this research paper is the vulnerability of Contrastive Language-Image Pre-training (CLIP) models to adversarial examples (AEs). Despite their remarkable zero-shot generalization capabilities, CLIP-based models are susceptible to AEs, which can compromise their safe deployment in real-world scenarios. The current methods of generating AEs using cosine similarity may fail to fool CLIP when more semantically enriched scores are used as alternatives, making the image encoder fine-tuned with these AEs less robust.
Key Innovation
The key innovation of this work is the proposal of Semantic-aware Adversarial Fine-Tuning (SAFT), a new framework that generates semantic-aware AEs by incorporating hallucination-aware textual descriptions during the fine-tuning process. SAFT aims to use more semantically enriched AEs to fine-tune the CLIP's image encoder, making it more robust to adversarial attacks. The framework consists of a semantic-ensemble attack that generates AEs by minimizing the average similarity between an image and an ensemble of selected textual descriptions.
Practical Impact
The practical impact of this research is significant, as it can lead to the development of more robust machine learning models, potentially improving the reliability of AI systems in various applications. The proposed SAFT algorithm can be applied to a wide range of downstream tasks, including large vision-language models, and can help mitigate the risks associated with the deployment of CLIP-based models in real-world scenarios. By making CLIP more robust to adversarial attacks, SAFT can contribute to the development of more trustworthy AI systems.
Analogy / Intuitive Explanation
Imagine you're trying to fool a security system by creating a fake key. The current methods of generating AEs are like creating a simple, one-dimensional key that may not be effective in fooling the system. SAFT is like creating a more sophisticated, multi-dimensional key that can fool the system more effectively. By incorporating hallucination-aware textual descriptions, SAFT generates AEs that are more semantically enriched and can better mimic the characteristics of real images, making it more challenging for CLIP to distinguish between real and fake images.
Paper Information
Categories:
cs.CV
Published Date:

arXiv ID:

2602.12461v1

Quick Actions