Policy2026-04-22
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
Source: Arxiv CS.AI
arXiv:2601.11037v2 Announce Type: replace Abstract: RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized via large-scale reinforcement learning, we identify a...
arxivpapersagents