Research2026-05-11

DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain

arXiv:2605.07699v1 Announce Type: cross Abstract: LLM-based agents are increasingly deployed for routine but consequential tasks in real-world domains, where their behavior is governed by inherently ambiguous domain policies that admit multiple valid interpretations. Despite the prevalence of such...

Read Original Article on Arxiv CS.AI

arxivpapersreasoningbenchmark