BeClaude
Research2026-05-06

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features

Source: Arxiv CS.AI

arXiv:2508.12535v3 Announce Type: replace-cross Abstract: Sparse Autoencoders (SAEs) can extract interpretable features from large language models (LLMs) without supervision. However, their effectiveness in downstream steering tasks is limited by the requirement for contrastive datasets or large...

arxivpapers