Research2026-04-22
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
Source: Arxiv CS.AI
arXiv:2604.02368v4 Announce Type: replace Abstract: As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks...
arxivpapers