Research2026-04-28
LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews
Source: Arxiv CS.AI
arXiv:2511.12635v2 Announce Type: replace-cross Abstract: Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetric conditions of...
arxivpapers