LLMs amplify code quality, they don't improve it
It has no taste of its own, only the momentum of whatever it reads. Point it at clean code, it stays clean. Point it at rot, it compounds the rot.Good input is still on you.https://malmo.sh/notes/llms-amplify-they-dont-improve
The central thesis of a recent Hacker News discussion—that LLMs amplify existing code quality rather than genuinely improving it—strikes at a critical assumption underpinning much of the current AI-assisted development boom. The observation, distilled from a post on malmo.sh, is deceptively simple: an LLM has no intrinsic standard of "good code." It is a mirror of its training data, and more critically, of the immediate context it is given. Feed it a well-structured, idiomatic codebase, and its suggestions will remain coherent. Feed it a tangled, poorly-typed mess, and it will dutifully generate more tangled, poorly-typed mess, compounding the technical debt at machine speed.
What Happened
The post argues that LLMs function as "amplifiers" rather than "improvers." The key mechanism is the prompt context. When a developer asks an LLM to "fix this function" or "add a feature," the model does not compare the code against an abstract ideal of quality. It performs a statistical prediction based on the surrounding code's patterns, naming conventions, error handling style, and even its indentation quirks. If the surrounding code is riddled with mutable global state and inconsistent error handling, the LLM will generate new code that matches those same patterns. The result is not a net improvement in quality, but a consistent, accelerated propagation of the existing quality level—whether that level is high or low.
Why It Matters
This insight has profound implications for the economics and risk profile of AI-assisted development. The current narrative often positions LLMs as a force that "raises the floor" for junior developers, helping them write better code. The amplification thesis suggests the opposite is true for teams with existing quality problems. A team struggling with a legacy codebase that already has poor test coverage, tight coupling, and unclear abstractions will not see an LLM magically clean it up. Instead, the LLM will accelerate the production of more code that fits those same anti-patterns, making the eventual refactoring effort exponentially harder.
For organizations, this means the ROI of AI coding tools is directly proportional to the existing quality of their codebase and the rigor of their engineering practices. A team with strong linting, comprehensive tests, and a culture of code review will find LLMs to be powerful accelerators. A team without those foundations is effectively paying for a faster path to a worse outcome. The tool does not teach quality; it merely amplifies the signal the developer sends.
Implications for AI Practitioners
The practical takeaway is a shift in responsibility. The "garbage in, garbage out" principle applies not just to the data the model was trained on, but to the code you put in the prompt. Practitioners must adopt a new discipline: curating the context they feed the LLM as carefully as they write the code itself. This means:
* Context is king: Before asking an LLM to generate a new module, ensure the surrounding files in the prompt are clean, well-documented examples of the desired pattern. * Test-first amplification: Use LLMs to generate code that passes existing tests, not to generate tests for untested code. The model will amplify the test's logic, not invent missing coverage. * Treat the LLM as a copy editor, not an author: Its strength is maintaining consistency within a given style, not inventing a better style. The architectural decisions and quality standards must still come from the human.
Key Takeaways
* LLMs are mirrors, not teachers: They reflect and amplify the quality of the code provided in the prompt, lacking any independent standard of "good" or "bad" design. * Quality debt compounds faster: Teams with poor code hygiene will see their problems accelerate, not be solved, when using LLMs extensively. * Context curation is a new core skill: Developers must deliberately clean and structure the code they feed into an LLM to get high-quality output. * ROI depends on engineering maturity: The value of AI coding tools is highest for teams that already have strong testing, linting, and review practices in place.