Research2026-05-11

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

arXiv:2605.07394v1 Announce Type: cross Abstract: Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate...

Read Original Article on Arxiv CS.AI

arxivpapers