BeClaude
Research2026-05-14

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Source: Arxiv CS.AI

arXiv:2605.13542v1 Announce Type: new Abstract: Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU...

arxivpapersagentsbenchmark