Research2026-04-28

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

arXiv:2410.05970v3 Announce Type: replace-cross Abstract: Multimodal document understanding is a challenging task to process and comprehend large amounts of textual and visual information. Recent advances in Large Language Models (LLMs) have significantly improved the performance of this task....

Read Original Article on Arxiv CS.AI

arxivpapersmultimodal