Research2026-04-20

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

arXiv:2604.16241v1 Announce Type: cross Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models handle specialized animal-related knowledge under a unified closed-book evaluation protocol. We...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark