Research2026-05-14
Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems
Source: Arxiv CS.AI
arXiv:2601.15161v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly used for clinical decision support, where hallucinations and unsafe suggestions may pose direct risks to patient safety. These risks are hard to assess: subtle clinical errors are often missed by...
arxivpapers