Culture Cartography: Mapping the Landscape of Cultural Knowledge

Generative AI & LLMs
Published: arXiv: 2510.27672v1
Authors

Caleb Ziems William Held Jane Yu Amir Goldberg David Grusky Diyi Yang

Abstract

To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior knowledge and the gaps therein. This allows a human respondent to fill these gaps and steer the model towards salient topics through direct edits. We implement this methodology as a tool called CultureExplorer. Compared to a baseline where humans answer LLM-proposed questions, we find that CultureExplorer more effectively produces knowledge that leading models like DeepSeek R1 and GPT-4o are missing, even with web search. Fine-tuning on this data boosts the accuracy of Llama-3.1-8B by up to 19.2% on related culture benchmarks.

Paper Summary

Problem
Large Language Models (LLMs) have the potential to empower users, but their utility is often limited for under-represented groups and cultures. This is because LLMs may not have access to culture-specific knowledge that is not learned during pre-training. As a result, LLMs may struggle to provide accurate and relevant information for users from diverse cultural backgrounds.
Key Innovation
The researchers propose a new methodology called Culture Cartography, which involves a mixed-initiative collaboration between humans and LLMs to collect culture-specific knowledge. This approach allows users to guide the process and steer the model towards salient topics, while the LLM initializes annotation with questions for which it has low-confidence answers. The researchers implement Culture Cartography as a tool called Culture Explorer, which is more effective than traditional annotation methods in producing knowledge that strong models are missing.
Practical Impact
The Culture Cartography methodology has several practical implications. Firstly, it can help to improve the accuracy of LLMs on related culture benchmarks, as demonstrated by the fine-tuning results. Secondly, the data produced with Culture Cartography is "Google-Proof", meaning that it is not easily retrievable from public web sources. This is an important advantage, as it allows researchers to collect high-quality data that is not contaminated by test set information. Finally, the Culture Cartography approach can be applied to a wide range of cultures and languages, making it a valuable tool for culturally-aware NLP research.
Analogy / Intuitive Explanation
Imagine you're trying to learn about a new culture, but the only resources you have are a few books and some online articles. You might struggle to find the information you need, and the information you do find might not be accurate or relevant. Culture Cartography is like having a guide who can help you navigate the cultural landscape, pointing out important topics and providing you with accurate information. The guide is the LLM, and the user is the human respondent who fills in the gaps and steers the model towards salient topics. This mixed-initiative collaboration allows for a more effective and efficient collection of culture-specific knowledge.
Paper Information
Categories:
cs.CL
Published Date:

arXiv ID:

2510.27672v1

Quick Actions