BeClaude
Research2026-05-12

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

Source: Arxiv CS.AI

arXiv:2605.09679v1 Announce Type: cross Abstract: Medical vision-language models (VLMs) and AI agents have made significant progress in learning to analyze and reason about clinical images. However, existing medical visual question answering (VQA) benchmarks collapse model capabilities into a...

arxivpapersagentsbenchmark