Towards Cellular-Scale Interpretability in Pathology Foundation Models for Biomarker Assessment
arXiv:2511.05150v2 Announce Type: replace-cross Abstract: Molecular biomarker testing in pathology is often costly and tissue-consuming, limiting scalable clinical deployment. Artificial intelligence applied to hematoxylin and eosin (HE)-stained histology could enable rapid biomarker screening, but...
What Happened
This research preprint tackles a critical bottleneck in computational pathology: the high cost and tissue consumption of molecular biomarker testing. The authors propose a pathway toward "cellular-scale interpretability" in pathology foundation models—large AI models trained on histology slides stained with hematoxylin and eosin (H&E). Their core contribution is demonstrating that these models can predict molecular biomarkers directly from standard H&E slides, potentially replacing or triaging expensive molecular assays. The work builds on the observation that foundation models often capture tissue-level features but struggle with the fine-grained cellular details necessary for accurate biomarker assessment. By refining model architectures or training strategies to enhance cellular-level resolution, the researchers aim to bridge this gap.
Why It Matters
The practical significance is substantial. Currently, biomarker testing—such as detecting mutations, gene fusions, or protein overexpression—requires specialized assays like immunohistochemistry or next-generation sequencing. These methods are not only costly but also consume precious tissue samples, limiting their use in early-stage or small biopsies. If AI can reliably infer biomarkers from routine H&E slides, which are already standard in pathology workflows, it could democratize access to precision oncology. This would be especially impactful in resource-limited settings where molecular testing infrastructure is sparse.
For the AI community, this research highlights a fundamental tension in foundation models: scale versus granularity. Large vision models trained on whole-slide images often excel at global tissue architecture but may lose local cellular detail due to downsampling or patch-based training. The push toward "cellular-scale interpretability" is a technical challenge that requires novel attention mechanisms, multi-resolution architectures, or hybrid approaches combining patch-level and slide-level features. Success here could unlock similar gains in other domains requiring fine-grained analysis from large-scale imagery, such as remote sensing or materials science.
Implications for AI Practitioners
- Domain-specific fine-tuning remains critical. Off-the-shelf pathology foundation models may not capture cellular-level features without targeted architectural modifications or specialized training data. Practitioners should evaluate whether their model’s receptive field and patch size align with the resolution of the biological question.
- Validation against ground truth biomarkers is non-trivial. The paper underscores the need for rigorous, paired datasets where H&E slides are matched with molecular assay results. AI teams must invest in high-quality, clinically annotated datasets to avoid spurious correlations.
- Interpretability is not just a nice-to-have—it’s a regulatory requirement. For AI to be deployed in clinical biomarker screening, pathologists must trust and understand the model’s reasoning. Cellular-scale interpretability methods (e.g., attention maps at the cell level) are likely to be essential for regulatory approval and clinical adoption.
- Cost-benefit analysis matters. Even if AI can predict biomarkers from H&E, the clinical utility depends on accuracy thresholds. A model with high sensitivity but low specificity could lead to unnecessary confirmatory testing, negating cost savings. Practitioners should define clinically meaningful performance benchmarks early.
Key Takeaways
- This research advances the goal of predicting molecular biomarkers from routine H&E-stained slides, reducing reliance on expensive, tissue-consuming molecular assays.
- Achieving cellular-scale interpretability in pathology foundation models requires architectural innovations beyond current large-scale vision models.
- AI practitioners must prioritize domain-specific fine-tuning, rigorous validation, and interpretability to meet clinical and regulatory standards.
- The work underscores a broader trend: foundation models are powerful, but their real-world impact depends on aligning model scale with task-specific resolution requirements.