Research2026-04-24

Fairness Evaluation and Inference Level Mitigation in LLMs

arXiv:2510.18914v4 Announce Type: replace-cross Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended...

Read Original Article on Arxiv CS.AI

arxivpapers