BeClaude
Research2026-05-08

MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs

Source: Arxiv CS.AI

arXiv:2512.20822v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly applied to medicine, yet their adoption is limited by concerns over reliability and safety. Existing evaluations either test factual medical knowledge in isolation or assess patient-level...

arxivpapersreasoningbenchmark