BeClaude
Research2026-04-20

Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants

Source: Arxiv CS.AI

arXiv:2510.24328v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly used to answer everyday questions, yet their performance on culturally grounded and dialectal content remains uneven across languages. We propose a comprehensive method that (i) translates Modern...

arxivpapersbenchmark