Back to News
Research2026-04-17
BenGER: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks
Source: Arxiv CS.AI
arXiv:2604.13583v1 Announce Type: cross Abstract: Evaluating large language models (LLMs) for legal reasoning requires workflows that span task design, expert annotation, model execution, and metric-based evaluation. In practice, these steps are split across platforms and scripts, limiting...
arxivpapersbenchmark