Research2026-04-28

AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models

arXiv:2604.22871v1 Announce Type: cross Abstract: Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that...

Read Original Article on Arxiv CS.AI

arxivpapersagents