Research2026-05-12
Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks
Source: Arxiv CS.AI
arXiv:2605.07024v1 Announce Type: cross Abstract: Large Language Models for code generation frequently produce hallucinations in Fill-in-the-Middle (FIM) tasks -- plausible but incorrect completions such as invented API methods, invalid parameters, undefined variables, or non-existent imports....
arxivpapersbenchmark