Industry2026-06-24

Ask HN: How to avoid LLMs struggling with Lisp parens?

LLMs seem to love certain languages (Python, Bash, etc.), but they all seem to struggle with Lisp (e.g. Racket or Emacs Lisp). I've tried various iterations of Claude, as well as cheaper models like DeepSeekV4, etc. and the pattern is the same: they'll make a few successful edits, but...

The Lisp Paren Problem: A Structural Blind Spot in LLMs

The Hacker News query about LLMs struggling with Lisp’s parentheses highlights a persistent, non-trivial limitation in current large language models. The user reports that across models—from Claude to DeepSeek—the pattern is consistent: initial success with Lisp code edits, followed by degradation. This isn’t a random failure; it points to a fundamental mismatch between how LLMs process text and the syntactic demands of Lisp-family languages.

What Happened

The user attempted to edit Racket and Emacs Lisp code using multiple LLMs. Early edits worked, but subsequent ones introduced errors—typically mismatched parentheses, broken nesting, or incorrect s-expression boundaries. This mirrors a known phenomenon: LLMs excel at languages with C-style syntax (Python, JavaScript, Bash) where code structure aligns more closely with natural language patterns. Lisp’s uniform prefix notation and heavy reliance on balanced parentheses create a different challenge.

Why It Matters

This isn’t just a niche complaint from Lisp enthusiasts. The issue reveals three structural weaknesses in current LLM architectures:

Tokenization Blindness to Nesting Depth: LLMs tokenize text as flat sequences. They lack explicit mechanisms to track recursive tree structures. A Python function’s indentation is visually salient; a Lisp expression’s nested parentheses are not. The model must implicitly learn to count and balance—a task that becomes exponentially harder as depth increases.

Context Window Fragmentation: Long Lisp files with deep nesting quickly consume context. As the model processes multiple edits, earlier structural context gets pushed out, leading to “drift” where later outputs fail to respect earlier parenthetical boundaries.

Training Data Imbalance: Lisp code is vastly underrepresented in training corpora compared to Python or JavaScript. Even when present, the examples are often short snippets rather than real-world, deeply nested codebases. The model learns shallow patterns, not robust structural reasoning.

Implications for AI Practitioners

For developers using LLMs for code generation, this is a practical warning: not all languages are equally supported. If your workflow involves Lisp, Clojure, or Scheme, you cannot rely on LLMs for complex refactoring or multi-step edits. The models will appear to work initially, then fail in ways that are hard to debug—silently breaking parenthetical balance.

For model builders, this suggests a need for structural awareness mechanisms. Possible mitigations include:

Pre-processing code to flatten or annotate nesting depth
Fine-tuning on balanced-parenthesis tasks
Using specialized tokenizers that preserve tree structure

Until then, practitioners should treat LLMs as syntax-aware autocomplete for Lisp, not as reliable editors. Always validate parenthetical balance programmatically after each LLM output—a simple check-parens function can catch errors the model cannot.

Key Takeaways

LLMs consistently struggle with Lisp’s nested parentheses due to tokenization that ignores tree structure and training data imbalances.
This failure mode is not model-specific; it appears across Claude, DeepSeek, and other architectures.
Practitioners should not trust LLMs for multi-step Lisp edits without automated parenthesis validation.
Model developers need to incorporate structural awareness (e.g., depth tracking) to handle recursive syntax reliably.

Read Original Article on Hacker News

hacker-news