Research2026-04-20
BAGEL: Benchmarking Animal Knowledge Expertise in Language Models
Source: Arxiv CS.AI
arXiv:2604.16241v1 Announce Type: cross Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models handle specialized animal-related knowledge under a unified closed-book evaluation protocol. We...
arxivpapersbenchmark