How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery
GPT-5 Pro helped solve a 3-year-old immunology mystery, offering insights into T cell behavior. The breakthrough could support cancer and autoimmune research.
A Three-Year Wall, Demolished in Days
The news that GPT-5 Pro helped immunologist Derya Unutmaz resolve a puzzle that had stymied his lab for three years is more than a feel-good story about AI in science. It is a concrete, high-signal data point about the shifting boundary between human expertise and machine reasoning. Unutmaz, a researcher at The Jackson Laboratory, used the model to analyze complex T cell behavior—specifically, why certain immune cells fail to respond effectively in chronic infections and cancer. The breakthrough, which could accelerate research into autoimmune diseases and immunotherapy, was not a matter of brute-force data processing. It was a matter of inference.
What Actually Happened
Unutmaz’s lab had been wrestling with a specific immunological mystery: the role of a particular signaling pathway in T cell exhaustion. Traditional bioinformatics tools and standard machine learning models had failed to connect the dots. GPT-5 Pro, however, was able to synthesize disparate findings from the literature, integrate the lab’s own experimental data, and propose a mechanistic hypothesis that the researchers had not considered. The model did not just retrieve information—it reasoned across domains, linking molecular biology, immunology, and clinical observations into a coherent framework. Unutmaz reportedly validated the hypothesis experimentally within days.
Why This Matters Beyond the Lab
This is not a case of AI generating a plausible-sounding but untestable guess. This is a case of AI producing a falsifiable, experimentally validated insight that a domain expert—with years of training—had missed. For the AI industry, this signals a maturation point: large language models are no longer just tools for summarization or code generation. They are becoming genuine partners in scientific discovery, capable of bridging the gap between what is known and what is knowable.
The implications for cancer and autoimmune research are obvious. T cell behavior is central to both fields, and any acceleration in understanding T cell exhaustion could directly impact the design of CAR-T therapies, checkpoint inhibitors, and vaccines. But the broader lesson is about the nature of expertise. Unutmaz did not need to become a prompt engineer. He needed to ask the right question and trust the model’s reasoning enough to test it.
Implications for AI Practitioners
For those building or deploying AI systems, this case offers three actionable insights:
- Reasoning over retrieval is the new frontier. GPT-5 Pro succeeded not because it had access to more data, but because it could reason across data. Practitioners should prioritize models with strong chain-of-thought and multi-step inference capabilities over those with larger context windows alone.
- Domain experts remain essential—but their role is shifting. The model did not replace Unutmaz; it augmented him. The value of the expert now lies in framing the problem, validating the output, and designing experiments. AI practitioners should build workflows that assume human-in-the-loop validation, not human-out-of-the-loop automation.
- Scientific validation is the ultimate benchmark. Synthetic benchmarks like MMLU or GPQA are useful, but they are proxies. Real-world scientific breakthroughs—validated by experiment—are the gold standard. Practitioners should track cases like this as leading indicators of model capability.
Key Takeaways
- GPT-5 Pro solved a three-year-old immunology mystery by reasoning across domains, not just retrieving information.
- The breakthrough was experimentally validated, marking a shift from AI as a text generator to AI as a scientific collaborator.
- For AI practitioners, the focus should move from data scale to reasoning depth, and from automation to expert-guided augmentation.
- Real-world scientific validation is the most credible measure of model capability—outperforming any synthetic benchmark.