BeClaude
Research2026-04-28

ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules

Source: Arxiv CS.AI

arXiv:2603.29928v2 Announce Type: replace Abstract: Tabular foundation models such as TabPFN and TabICL already produce full predictive distributions, yet prevailing regression benchmarks evaluate them almost exclusively via point-estimate metrics (RMSE, $R^2$). This discards precisely the...

arxivpapersbenchmark