Skip to content
BeClaude
Research2026-06-30

Post-training for Efficient Communication via Convention Formation

Originally published byArxiv CS.AI

arXiv:2508.06482v2 Announce Type: replace-cross Abstract: Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this behavior. We develop a post-training...

What Happened

A new preprint (arXiv:2508.06482) tackles a fundamental gap between human and machine communication: the ability to form ad-hoc conventions that make multi-turn interactions more efficient. While humans naturally compress and specialize their language over repeated exchanges—developing shorthand, jargon, and shared references—large language models do not exhibit this behavior without intervention.

The researchers propose a post-training method designed to instill convention formation in LLMs. Rather than relying on prompting or in-context learning, their approach modifies the model's behavior during a dedicated training phase, encouraging it to adapt its linguistic output across successive interactions with the same interlocutor. The result is that models learn to communicate more concisely and accurately over time, mirroring the human tendency to "get on the same page" through repeated contact.

Why It Matters

This work addresses a subtle but significant limitation of current LLMs. Most models treat each conversation turn as largely independent, failing to build the kind of cumulative understanding that makes human dialogue efficient. In practice, this means users must repeatedly clarify context, re-explain preferences, or spell out assumptions that should be obvious after the first few exchanges.

The implications extend beyond mere convenience. Convention formation is a cornerstone of collaborative intelligence—it enables teams to coordinate with minimal overhead, reduces cognitive load, and allows for increasingly sophisticated joint problem-solving. By teaching models to form conventions, we move closer to AI systems that can function as true collaborators rather than static query responders.

For AI practitioners, the post-training approach is particularly noteworthy. It suggests that convention formation is not an emergent property of scale or architecture, but a learned behavior that can be explicitly trained. This opens the door to targeted interventions that improve interaction quality without requiring massive model retraining or architectural changes.

Implications for AI Practitioners

  • Deployment efficiency: Models that form conventions can reduce token usage and latency in multi-turn applications like customer support, coding assistants, and collaborative writing tools. Fewer words per turn means lower costs and faster responses.
  • User experience design: Applications should be designed to support repeated interactions with the same model instance. Session persistence and context management become more valuable when the model can "learn" user preferences over time.
  • Evaluation metrics: Standard benchmarks that measure single-turn accuracy miss this dimension of performance. Practitioners may need to develop multi-turn efficiency metrics that track compression ratios, convention emergence, and interaction quality over time.
  • Fine-tuning strategy: The post-training approach suggests that convention formation can be treated as a separate training objective, potentially combined with instruction tuning or RLHF without interference.

Key Takeaways

  • LLMs do not naturally form communication conventions across multiple turns, unlike humans; this gap can be addressed through targeted post-training.
  • Teaching models to adapt their language over repeated interactions reduces verbosity and improves collaborative efficiency.
  • The approach has practical benefits for deployment cost, latency, and user experience in multi-turn applications.
  • Convention formation is a trainable behavior, not an emergent property of scale, making it accessible to practitioners with limited compute resources.
arxivpapers