Research2026-07-02

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Originally published byArxiv CS.AI

arXiv:2607.01002v1 Announce Type: cross Abstract: In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for interpreting...

This new paper from arXiv introduces a method called Logit-Contribution Scoring (LCS) to solve a specific but critical problem in mechanistic interpretability: how to find the attention heads responsible for non-literal retrieval in large language models.

What Happened

Most prior work on "retrieval heads" has focused on heads that copy-paste tokens directly from the prompt to the output. This is relatively easy to detect—you can trace the logit flow back to a specific source token. However, in long-context scenarios, models often perform semantic synthesis: they read a passage, understand its meaning, and generate a response that is derived from that context but not a verbatim copy.

The researchers developed Logit-Contribution Scoring to isolate these "non-literal retrieval heads." Instead of tracking token identity, LCS measures how much an attention head’s output contributes to the final logits for a given token, regardless of whether that token appears in the source text. This allows them to distinguish between heads that simply copy information and heads that transform or synthesize information from context.

Their findings reveal a functional specialization within the model: certain heads are dedicated to literal retrieval, while a distinct set of heads handle the more complex task of semantic synthesis. This is a meaningful step beyond the "copy head" paradigm that has dominated the literature.

Why It Matters

This work addresses a blind spot in mechanistic interpretability. As models are deployed on increasingly long documents—legal briefs, medical records, codebases—the ability to synthesize information (e.g., "summarize the key arguments from this 50-page contract") becomes more important than literal retrieval (e.g., "find the exact date in line 342").

If we cannot identify which components of the model perform synthesis, we cannot:

Debug hallucinations that arise from incorrect synthesis.
Optimize models for tasks requiring abstraction over long contexts.
Build targeted interventions (e.g., patching or editing specific heads) to improve performance on summarization or reasoning tasks.

LCS provides a surgical tool for this. It moves interpretability from "where does the model copy from?" to "where does the model think from?"

Implications for AI Practitioners

For engineers and researchers working with long-context models, this has practical implications:

Better debugging of long-context failures. If a model misreads a contract clause and generates a wrong summary, LCS can help identify whether the error originated in a synthesis head (bad understanding) or a copy head (bad retrieval). This narrows the search space for root cause analysis.

Targeted fine-tuning and steering. Knowing which heads handle synthesis opens the door to sparse fine-tuning—adjusting only the relevant parameters to improve a model’s ability to reason over long contexts without retraining the entire architecture.

Architecture design feedback. If synthesis heads are consistently located in specific layers or attention patterns, this informs the design of next-generation architectures that explicitly separate retrieval and reasoning pathways.

The paper is a technical contribution, but its core insight—that non-literal retrieval is a distinct, identifiable mechanism—should influence how practitioners think about model behavior in production. The days of treating all attention heads as interchangeable are ending.

Key Takeaways

Logit-Contribution Scoring (LCS) is a new method for identifying attention heads that perform semantic synthesis, not just literal copy-paste.
Non-literal retrieval heads are functionally distinct from copy heads, revealing a division of labor in how models process long contexts.
For practitioners, LCS enables more precise debugging of long-context errors and opens the door to targeted model steering and fine-tuning.
The work shifts interpretability from token-level tracing to meaning-level analysis, which is essential for understanding how models reason, not just retrieve.

Read Original Article on Arxiv CS.AI

arxivpapers