Research2026-05-12
LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
Source: Arxiv CS.AI
arXiv:2605.10641v1 Announce Type: cross Abstract: Large Vision-Language Models (VLMs) are successful in addressing a multitude of vision-language understanding tasks, such as Visual Question Answering (VQA), but their memory and compute requirements remain a concern for practical deployment. A...
arxivpapersvision