Research2026-05-06

When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models

arXiv:2605.02363v1 Announce Type: cross Abstract: Deployed language models must produce outputs that are both correct and format-compliant. We study this structured-output reliability gap using two mathematical benchmarks -- GSM8K and MATH -- as a controlled testbed: ground truth is unambiguous and...

Read Original Article on Arxiv CS.AI

arxivpapers