Research2026-06-29

When Does Personality Composition Matter for Multi-Agent LLM Teams?

Originally published byArxiv CS.AI

arXiv:2606.27443v1 Announce Type: new Abstract: Personality prompting shapes how large language models communicate, yet whether these behavioral shifts affect objective task outcomes remains under-explored. Prior work shows that agents prompted with low agreeableness produce adversarial language,...

When Personality Composition Matters for Multi-Agent LLM Teams

A new preprint from arXiv (2606.27443) investigates a deceptively simple question: does the personality composition of a multi-agent LLM team actually change objective task outcomes? The researchers focused on a specific personality trait—agreeableness—and found that prompting agents with low agreeableness leads to adversarial language and measurable shifts in team dynamics. This moves the conversation beyond whether personality sounds different to whether it performs differently.

The study’s core contribution is empirical. While prior work has shown that LLMs can mimic personality traits when prompted (e.g., “you are disagreeable”), it has been less clear whether these behavioral nudges translate into different task success rates, negotiation outcomes, or solution quality. The arXiv paper provides evidence that they do: low-agreeableness agents produce more adversarial language, which in turn alters the team’s collaborative trajectory. This suggests that personality is not just a stylistic veneer—it can functionally reshape how multi-agent systems solve problems.

Why this matters. As organizations deploy multi-agent LLM systems for tasks like code review, contract negotiation, or strategic planning, the default assumption has often been that all agents are cooperative by design. This research challenges that assumption. If a single agent’s personality prompt can shift the entire team’s output, then practitioners must treat personality as a configurable parameter—not a cosmetic afterthought. The implications extend to safety: adversarial language in a multi-agent loop could escalate into unproductive conflict, wasted tokens, or even harmful outputs if left unchecked. Implications for AI practitioners. First, teams building multi-agent architectures should audit their prompt templates for implicit personality cues. A simple “you are critical” or “you are skeptical” may be more than a role description—it may be an operational lever. Second, the study suggests that personality diversity within a team could be strategically tuned: a mix of high- and low-agreeableness agents might improve debate quality without descending into dysfunction. Third, evaluation benchmarks for multi-agent systems should include personality-sensitive metrics, not just final task accuracy. If an agent team “succeeds” but only through adversarial coercion, that success may not generalize to real-world collaboration.

The paper leaves open questions—how personality interacts with other traits like openness or conscientiousness, and whether these effects hold across different model families—but it establishes a clear foundation: personality composition is not noise; it is signal.

Key Takeaways

Prompting LLM agents with low agreeableness produces measurable adversarial language that alters team task outcomes, not just conversational style.
Personality traits should be treated as functional configuration parameters in multi-agent systems, not cosmetic role descriptions.
Practitioners should audit prompt templates for implicit personality cues and consider strategic diversity in agent personality composition.
Evaluation of multi-agent LLM teams should incorporate personality-sensitive metrics beyond final task accuracy.

Read Original Article on Arxiv CS.AI

arxivpapersagents