BeClaude
Research2026-04-28

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Source: Arxiv CS.AI

arXiv:2410.05970v3 Announce Type: replace-cross Abstract: Multimodal document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task....

arxivpapersmultimodal