BeClaude
Research2026-05-11

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Source: Arxiv CS.AI

arXiv:2605.07394v1 Announce Type: cross Abstract: Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate...

arxivpapers