Position Bias Correction is Insufficient for One-Pass Attention Sorting
arXiv:2606.27793v1 Announce Type: cross Abstract: Long-context language models suffer from position bias, where information in middle positions is underutilized. Attention Sorting addresses this by iteratively reordering documents based on attention patterns, but its multiple sort-and-generate...
What Happened
A new preprint from arXiv (2606.27793) challenges a core assumption in long-context language model optimization: that position bias can be fixed simply by reordering documents according to attention scores. The paper demonstrates that while Attention Sorting—a technique that iteratively reorders input documents based on where the model pays attention—does improve retrieval from middle positions, it fails to fully resolve the underlying problem. The authors argue that position bias correction alone is insufficient for one-pass attention sorting, meaning the model’s inherent tendency to favor early and late positions persists even after reordering.
Why It Matters
Position bias is a well-known limitation of transformer-based LLMs. When processing long contexts (e.g., 128K tokens or more), models consistently underutilize information placed in the middle of the input. This has practical consequences: RAG pipelines, document summarization, and multi-hop reasoning tasks all suffer when critical evidence sits in the “forgotten middle.” Attention Sorting was proposed as a lightweight fix—reorder documents by attention scores before generation, without retraining. This paper shows that fix is incomplete.
The key insight is that attention sorting itself introduces a new form of bias: it prioritizes documents that initially receive high attention, which may not be the most relevant ones. The sorting process can amplify early attention patterns rather than correcting for positional distortions. In other words, the model’s attention becomes a self-fulfilling prophecy—documents that happen to catch the model’s eye get promoted, while genuinely relevant but initially overlooked content remains buried. This means practitioners relying on attention-based reordering may be trading one form of bias for another.
Implications for AI Practitioners
For engineers building long-context applications, this research carries a cautionary message. If you are using attention sorting as a post-hoc fix for position bias, you should verify that it actually improves downstream task accuracy, not just attention distribution metrics. The paper suggests that a single pass of sorting is not enough—multiple iterations or hybrid approaches (e.g., combining attention sorting with explicit position-aware weighting) may be necessary.
More broadly, the findings underscore that position bias is not a surface-level artifact that can be patched with input reordering. It is deeply embedded in how attention mechanisms process sequential information. Practitioners should consider training-time interventions (e.g., position-aware training objectives or rotary position encoding adjustments) as more robust solutions. For production systems, the safest approach remains: (1) benchmark your specific task with and without attention sorting, (2) measure recall for mid-position documents explicitly, and (3) do not assume that improved attention metrics translate to better factual retrieval.
Key Takeaways
- Attention Sorting does not eliminate position bias—it merely shifts where bias manifests, potentially reinforcing initial attention patterns.
- One-pass reordering is insufficient; multiple sorting iterations or hybrid correction methods may be required for reliable long-context retrieval.
- Practitioners must validate attention-based reordering on actual task performance, not just attention distribution metrics.
- Long-term solutions likely require training-time modifications (e.g., position-aware objectives) rather than post-hoc input manipulation.