BeClaude
Research2026-05-12

Exploitation Without Deception: Dark Triad Feature Steering Reveals Separable Antisocial Circuits in Language Models

Source: Arxiv CS.AI

arXiv:2605.09773v1 Announce Type: cross Abstract: We use sparse autoencoder (SAE) feature steering to amplify Dark Triad personality traits (Machiavellianism, narcissism, and psychopathy) in Llama-3.3-70B-Instruct and evaluate the resulting behavioral changes across five psychological instruments....

arxivpapers