The Verbose Context Problem in Medical Records
arXiv:2606.29503v1 Announce Type: cross Abstract: The verbose context problem occurs when structured concepts have token-inefficient textual representations. This bottleneck is acute in population health: cohort-level analysis of longitudinal patient records requires reasoning over thousands of...
This paper from arXiv identifies a practical bottleneck that has likely been silently degrading the performance of clinical LLMs: the verbose context problem. The core issue is that structured medical concepts—like a diagnosis code (e.g., ICD-10) or a lab result—are often stored as long, human-readable text strings. When a model processes a patient record spanning years, these verbose strings consume massive token budgets, leaving little room for the model to perform actual reasoning over the temporal sequence of events.
What HappenedThe researchers demonstrate that in population health analytics, where a model must analyze thousands of patient encounters, the token inefficiency of textual representations becomes a critical constraint. A single patient note might contain a concept like "Type 2 diabetes mellitus without complications" (6 tokens) when a structured code like "E11.9" (1 token) would suffice. When scaled to a cohort of 10,000 patients with 50 encounters each, the difference in token consumption is exponential. The paper argues that this isn't just a cost issue—it actively harms model performance by forcing the model to "read" verbose descriptions instead of reasoning about the underlying clinical relationships.
Why It MattersThis is a subtle but high-impact problem for three reasons. First, it explains why many clinical LLMs underperform on longitudinal tasks (e.g., predicting readmission risk over 5 years) despite strong performance on single-visit tasks. The model is drowning in tokens before it can see the full picture. Second, it highlights a mismatch between how data is stored (human-readable for clinicians) and how it should be consumed (machine-efficient for models). Third, it suggests that simple prompt engineering or fine-tuning may not solve the issue—the bottleneck is in the input representation itself.
Implications for AI PractitionersFor AI teams building healthcare models, this paper offers a clear operational insight: compress your structured concepts before feeding them to the LLM. This could mean mapping ICD codes to short integer IDs, using a learned embedding lookup table, or creating a specialized tokenizer for clinical ontologies. Practitioners should audit their token budgets per patient record. If a single patient’s history consumes more than 50% of the context window on static descriptions (vs. temporal events), the verbose context problem is likely degrading performance. The paper also implies that retrieval-augmented generation (RAG) systems for clinical data must be designed to retrieve compressed, not raw, concept representations.
Key Takeaways
- Token budget is a reasoning constraint: Verbose medical concepts silently consume context windows, limiting a model's ability to reason over longitudinal data.
- Performance on single visits does not predict performance on cohorts: The problem only becomes acute at scale, making it easy to miss during initial model evaluation.
- Input representation is a design choice: Practitioners should pre-process structured clinical data into compressed, token-efficient formats (e.g., concept IDs) before LLM ingestion.
- Cost and latency are secondary concerns: The primary harm is degraded reasoning quality, not just higher API bills.