Do LLMs Benefit From Their Own Words?

Generative AI & LLMs
Published: arXiv: 2602.24287v1
Authors

Jenny Y. Huang Leshem Choshen Ramon Astudillo Tamara Broderick Jacob Andreas

Abstract

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

Paper Summary

Problem
Large language models (LLMs) are being used in complex multi-turn interactions, but retaining past model outputs in the conversation history can lead to increased computational costs, slow inference speeds, and impaired capacity to attend to relevant information. The researchers aim to investigate whether LLMs benefit from conditioning on their own prior responses in real-world multi-turn conversations.
Key Innovation
The researchers find that removing prior assistant responses does not affect response quality on a large fraction of turns. They also identify instances of "context pollution," where models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. To address this issue, they develop an adaptive assistant-response-omission strategy that selectively omits assistant-side context.
Practical Impact
The findings suggest that indiscriminately storing prior assistant responses may be unnecessary and even counterproductive in real-world multi-turn chats. By selectively omitting assistant history, the researchers show that response quality can be improved while reducing memory consumption. This approach has the potential to improve the efficiency and effectiveness of LLMs in complex conversations.
Analogy / Intuitive Explanation
Imagine having a conversation with a friend who keeps reminding you of previous conversations, even if they're not relevant to the current topic. You might find it annoying and distracting. Similarly, LLMs can get stuck in their own thoughts and over-condition on previous responses, leading to errors and inaccuracies. By selectively omitting assistant history, the researchers are essentially giving the LLMs a "clear mind" to focus on the current conversation and provide better responses.
Paper Information
Categories:
cs.CL cs.AI
Published Date:

arXiv ID:

2602.24287v1

Quick Actions