BeClaude
Industry2026-06-18

Ask HN: How much do you trust LLMs with your health questions?

Source: Hacker News

As software engineers, we've definitely seen the best use case for LLMs so far. Coding agents have shown us what these models can do in an area we can actually verify. But lately I've found myself comparing my doctor's notes against LLM answers more and more. Curious how others are...

The Hacker News thread “Ask HN: How much do you trust LLMs with your health questions?” captures a quiet but significant shift in how technically literate users are interacting with large language models. The author, a software engineer, notes a growing habit of cross-referencing their doctor’s notes against LLM outputs—a behavior born from the unique position of being able to verify LLM performance in coding, where correctness is objectively testable.

What Happened

The post is not a formal study but a pulse-check from the developer community. The author describes a personal workflow: after a medical consultation, they feed their doctor’s notes into an LLM to compare interpretations, check for omissions, or explore alternative explanations. The underlying sentiment is one of cautious pragmatism—these engineers trust LLMs in coding because they can immediately test the output. Health questions, by contrast, involve high stakes, opaque reasoning, and no instant verification loop. The thread’s responses likely range from enthusiastic endorsement (citing successful self-diagnosis or medication clarifications) to sharp warnings about hallucination risks and liability.

Why It Matters

This trend matters for three reasons. First, it reveals a growing “verification gap” in healthcare. Software engineers are uniquely equipped to spot when an LLM’s confident-sounding answer is wrong in code; they lack the same feedback mechanism for medical advice. This asymmetry creates a dangerous illusion of reliability. Second, it signals that LLMs are becoming de facto second opinions, even among those who understand their limitations. The act of “comparing notes” implies the user already distrusts or supplements their doctor—a dynamic that could strain the patient-provider relationship or lead to unnecessary anxiety. Third, it highlights a market failure: no major LLM provider has built a reliable, medically validated health Q&A layer that can match the rigor of a human doctor’s differential diagnosis.

Implications for AI Practitioners

For engineers and product builders, this thread is a direct call to action. The core challenge is not just accuracy—it’s verifiability. In coding, a broken function is obvious. In medicine, a plausible-sounding hallucination can be indistinguishable from truth. Practitioners should consider:

  • Building domain-specific guardrails: Fine-tuning models on curated medical datasets (e.g., PubMed, clinical guidelines) and implementing citation requirements for every claim.
  • Creating verification interfaces: Tools that let users flag LLM outputs for review by human experts, or that automatically compare answers against authoritative databases.
  • Designing for uncertainty: Models that explicitly state confidence levels and differential diagnoses, rather than offering a single “best” answer.
The HN thread is a canary in the coal mine. As more technically savvy users adopt LLMs for health, the industry will face pressure to move beyond general-purpose chatbots toward specialized, auditable, and accountable medical AI tools.

Key Takeaways

  • Software engineers are increasingly using LLMs as a second opinion on health questions, driven by their ability to verify outputs in coding but lacking equivalent verification in medicine.
  • This behavior creates a dangerous verification gap: plausible-sounding medical hallucinations are harder to detect than coding errors.
  • The trend signals a market need for domain-specific, citation-backed health AI tools that prioritize verifiability over conversational fluency.
  • AI practitioners should focus on building guardrails, verification interfaces, and uncertainty-aware outputs to safely serve health-related queries.
hacker-news