Research2026-04-28
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Source: Arxiv CS.AI
arXiv:2410.05970v3 Announce Type: replace-cross Abstract: Multimodal document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task....
arxivpapersmultimodal