Research2026-04-22
STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs
Source: Arxiv CS.AI
arXiv:2604.18177v2 Announce Type: replace-cross Abstract: Benchmarks are often used as a standard to understand LLM capabilities in different domains. However, aggregate benchmark scores provide limited insight into compositional skill gaps of LLMs and how to improve them. To make these weaknesses...
arxivpapers