Research2026-05-12

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

arXiv:2605.09679v1 Announce Type: cross Abstract: Medical vision-language models (VLMs) and AI agents have made significant progress in learning to analyze and reason about clinical images. However, existing medical visual question answering (VQA) benchmarks collapse model capabilities into a...

Read Original Article on Arxiv CS.AI

arxivpapersagentsbenchmark