BeClaude
Research2026-04-22

Has Automated Essay Scoring Reached Sufficient Accuracy? Deriving Achievable QWK Ceilings from Classical Test Theory

Source: Arxiv CS.AI

arXiv:2604.19131v1 Announce Type: new Abstract: Automated essay scoring (AES) is commonly evaluated on public benchmarks using quadratic weighted kappa (QWK). However, because benchmark labels are assigned by human raters and inevitably contain scoring errors, it remains unclear both what QWK is...

arxivpapers