Research2026-05-12
MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies
Source: Arxiv CS.AI
arXiv:2605.09661v1 Announce Type: cross Abstract: Large language models (LLMs) have saturated standard medical benchmarks that test factual recall, yet their ability to perform higher-order reasoning, such as synthesizing evidence from multiple sources, remains critically under-explored. To address...
arxivpapersbenchmark