Research2026-04-28
AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
Source: Arxiv CS.AI
arXiv:2603.21362v2 Announce Type: replace Abstract: LLM-as-Judge evaluation fails agent tasks because a fixed rubric cannot capture what matters for this task: code debugging demands Correctness and Error Handling; web navigation demands Goal Alignment and Action Efficiency. We present ADARUBRIC,...
arxivpapersagents