Research2026-04-20
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
Source: Arxiv CS.AI
arXiv:2604.14967v2 Announce Type: replace-cross Abstract: Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual RAG systems typically rely on generic retrieval signals that overlook the fine-grained visual...
arxivpapersrag