BeClaude
Research2026-05-12

Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing

Source: Arxiv CS.AI

arXiv:2601.18061v3 Announce Type: replace Abstract: Learning from human feedback~(LHF) assumes that expert judgments, appropriately aggregated, yield valid ground truth for training and evaluating AI systems. We tested this assumption in mental health, where high safety stakes make expert consensus...

arxivpaperssafety