BeClaude
Research2026-05-01

Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Source: Arxiv CS.AI

arXiv:2603.21016v2 Announce Type: replace-cross Abstract: Large language models (LLMs) used for multiple-choice and pairwise evaluation tasks often exhibit selection bias due to non-semantic factors like option positions and label symbols. Existing inference-time debiasing is costly and may harm...

arxivpapers