Research2026-06-30

Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict

Originally published byArxiv CS.AI

arXiv:2601.03546v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used to simulate decision-making tasks involving personal data sharing, where privacy concerns and prosocial motivations can push choices in opposite directions. Existing evaluations often...

This new research from ArXiv tackles a subtle but critical tension in LLM behavior: the conflict between privacy and prosociality. When an LLM is asked to simulate a decision about sharing personal data—for example, a medical history for a public health study—it must weigh the individual’s desire for privacy against the collective benefit. The paper, “Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict,” moves beyond simple safety benchmarks to probe how models handle genuine ethical dilemmas where there is no single “correct” answer.

What the Research Reveals

The core finding is that LLMs exhibit inconsistent value-action alignment in these conflict scenarios. A model might verbally endorse a principle like “privacy is paramount” but then, when placed in a specific contextual simulation, choose to share data for a prosocial reason. This isn’t a simple failure of instruction-following; it reflects a deeper brittleness in how models prioritize competing values. The research suggests that current alignment techniques—which often focus on avoiding harmful outputs or maximizing helpfulness—do not adequately prepare models for situations where two positive values (e.g., protecting privacy vs. helping others) are in direct opposition.

Why It Matters for AI Practitioners

This has immediate, practical implications for anyone deploying LLMs in high-stakes decision support. Consider a healthcare chatbot designed to help patients decide whether to enroll in a research registry. If the model’s behavior oscillates between extreme privacy protection and extreme data sharing based on minor prompt variations, it undermines user trust and creates legal liability. The research highlights that “alignment” is not a binary state; it is a spectrum of trade-offs that must be explicitly engineered.

For developers, this means that standard red-teaming for toxicity or bias is insufficient. You must also test for value conflicts. A model that passes a “do no harm” test might still fail a “do the right thing when two goods conflict” test. The paper implicitly calls for a new class of evaluation: conflict resolution benchmarks that measure a model’s consistency in applying a defined ethical hierarchy.

Implications for System Design

The practical path forward involves more than just better prompts. Practitioners should consider implementing explicit “value arbitration” layers outside the model. For instance, rather than letting the LLM decide the trade-off between privacy and prosociality in real-time, a system could use a rules engine to define the hierarchy (e.g., “user consent always overrides aggregate benefit”) and then have the LLM execute only within those guardrails. This reduces the burden on the model to solve a philosophical problem and instead asks it to perform a constrained reasoning task.

Key Takeaways

LLMs struggle with ethical trade-offs. Standard alignment techniques do not prepare models for scenarios where two positive values (privacy vs. prosociality) conflict, leading to inconsistent behavior.
Context matters more than stated principles. A model may verbally endorse a value but act against it when placed in a specific simulation, revealing a gap between declared and enacted values.
New evaluation benchmarks are needed. Practitioners should develop conflict-resolution tests alongside standard safety and bias evaluations to ensure consistent decision-making.
External value arbitration reduces risk. For production systems, offloading value hierarchy decisions to a deterministic rules layer is more reliable than relying on the LLM to resolve ethical dilemmas autonomously.

Read Original Article on Arxiv CS.AI

arxivpapers