LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context
arXiv:2606.24585v1 Announce Type: new Abstract: While the validity of LLMs' use in the legal context remains subject to ethical and legal debate, legal professionals are already experimenting with personal LLMs, if only for translation and reformulation. However, even such a seemingly innocuous use...
The Overrefusal Problem in On-Premises Legal LLMs
Recent research from arXiv (2606.24585v1) highlights a critical and counterintuitive finding: when legal professionals use small, on-premises large language models for seemingly low-risk tasks like translation and reformulation in criminal legal contexts, these models exhibit a pronounced tendency toward "overrefusal." That is, they decline to process or respond to prompts that contain legal terminology or context, even when the request is benign and purely linguistic in nature.
What Happened
The study systematically tested small, locally deployed LLMs on tasks that involved rephrasing or translating sentences containing criminal legal references. The models frequently refused to comply, citing ethical or safety concerns, even though the prompts posed no actual risk of generating harmful content. This overrefusal was not observed in larger, cloud-based models or in general-domain prompts. The researchers attribute this to a combination of aggressive safety fine-tuning, limited contextual understanding, and the models' inability to distinguish between using legal language and providing legal advice or generating dangerous content.
Why It Matters
This finding is significant for several reasons. First, it exposes a blind spot in current alignment strategies. Safety training is often applied broadly, but on smaller models with less capacity for nuanced reasoning, it can lead to a brittle "better safe than sorry" behavior that cripples utility. For legal professionals, who are already experimenting with local LLMs to avoid sending sensitive client data to cloud APIs, this overrefusal is not just an annoyance—it is a functional barrier. A translation tool that refuses to translate a police report or a reformulation tool that declines to simplify a warrant affidavit is essentially broken for its intended use case.
Second, this highlights a tension between privacy and performance. On-premises models are chosen for data security, but their reduced capacity makes them more susceptible to misaligned safety guardrails. The very feature that makes them safe for confidential data (local deployment) is undermined by a safety mechanism that makes them unusable for that data.
Implications for AI Practitioners
For developers and engineers deploying LLMs in regulated or sensitive domains, this research offers a clear warning: safety fine-tuning must be context-aware. A one-size-fits-all refusal policy does not transfer well from general-purpose chatbots to specialized legal tools. Practitioners should consider:
- Domain-specific safety tuning: Instead of using off-the-shelf safety classifiers, train or fine-tune refusal boundaries on legal corpora that include benign linguistic tasks.
- Prompt engineering for context: Explicitly instruct the model that the task is reformulation, not legal analysis. This may reduce false positives.
- Evaluation beyond accuracy: Include refusal rate as a key performance metric. A model that refuses 30% of valid prompts is not safe—it is non-functional.
Key Takeaways
- Small on-premises LLMs overrefuse prompts with criminal legal context, even for harmless tasks like translation, due to aggressive and poorly calibrated safety alignment.
- This undermines the core value proposition of local deployment for legal professionals: private, functional AI assistance.
- AI practitioners must develop domain-specific safety tuning and evaluate refusal rates as a critical performance metric, not just output accuracy.
- The finding underscores that safety and utility are not inherently opposed, but require careful, context-driven balancing rather than blanket restrictions.