Vision-Language Models Suppress Female Representations Under Ambiguous Input

Generative AI & LLMs
Published: arXiv: 2605.31556v1
Authors

Arnau Marin-Llobet Simon Henniger Mahzarin R. Banaji

Abstract

Alignment teaches vision-language models (VLMs) to avoid expressing demographic biases, and when gender is clearly visible they largely succeed. Far less is known about ambiguous inputs (a worker in full gear, a figure seen from behind) cases common in practice yet rarely studied. We find that minimal prompting pressure exposes occupation-gender defaults when prompting ambiguous input images, with models collapsing to male even for strongly female-stereotyped occupations. But do these outputs reflect what models actually encode internally? We introduce LALS (Latent Association Leaning Score), a zero-shot metric that projects visual-token activations into the model's text-embedding space to measure concept associations per token and layer. Across 15 occupations, over 800 gender-ambiguous images, and four VLMs, internal representations and outputs are systematically decoupled: models often encode a female association internally yet output male. Layer-wise analysis reveals an asymmetric filter -- male signal amplifies end-to-end while female signal peaks mid-network and is suppressed before generation -- and a color ablation shows that culturally loaded visual cues such as clothing color further modulate these internal associations.

Paper Summary

Problem
Bias in vision-language models (VLMs) is a pressing concern, especially in high-stakes applications like content moderation and image retrieval. While these models have made significant progress in avoiding stereotypical or harmful associations in their outputs, researchers have found that the bias may still exist beneath the surface.
Key Innovation
The paper introduces LALS (Latent Association Learning Score), a zero-shot metric that measures concept associations at the level of individual visual tokens and layers within the model. This allows researchers to examine the internal representations of VLMs and identify potential biases that may not be apparent in their outputs.
Practical Impact
The findings of this study have significant implications for the development and deployment of VLMs. By identifying biases in internal representations, researchers can develop more effective methods for auditing and mitigating bias in these models. This can help to ensure that VLMs are fair and unbiased in their outputs, even when faced with ambiguous or unclear visual inputs.
Analogy / Intuitive Explanation
Imagine a model that can recognize images of people, but has a bias towards seeing men. When shown a clear image of a woman, the model can accurately identify her as a woman. However, when shown an image of a person from behind, or with their face obscured, the model defaults to seeing a man. This is similar to what the researchers found in this study, where VLMs often encode a female association internally, but output a male when faced with ambiguous inputs. The LALS metric allows researchers to "look under the hood" of the model and identify these biases, even when they are not apparent in the outputs.
Paper Information
Categories:
cs.CV cs.AI cs.CL cs.CY cs.HC
Published Date:

arXiv ID:

2605.31556v1

Quick Actions