Research2026-05-06

Can LLMs Compress (and Decompress)? Evaluating Code Understanding and Execution via Invertibility

arXiv:2601.13398v2 Announce Type: replace-cross Abstract: LLMs demonstrate strong performance on code benchmarks, yet consistent reasoning across forward and backward execution remains elusive. We present RoundTripCodeEval (RTCE), a benchmark of four code execution reasoning tasks that evaluates...

Read Original Article on Arxiv CS.AI

arxivpapers