Guide2026-05-05

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to convert natural language to SQL using Claude. Covers prompting, chain-of-thought, RAG for complex schemas, and self-improvement loops with code examples.

Quick Answer

This guide walks you through building a Text-to-SQL system using Claude. You'll learn basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.

Text-to-SQLClaude APIPrompt EngineeringRAGSQL Generation

Introduction

Text-to-SQL is a powerful natural language processing task that converts plain English queries into structured SQL statements. This capability democratizes data access, allowing non-technical team members to query databases without learning SQL syntax, while also accelerating the workflow of experienced data analysts.

Claude excels at this task because it can understand context, interpret complex queries, and generate accurate SQL statements—even those involving multiple joins, subqueries, and aggregations. In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude, from basic prompting to advanced techniques like Retrieval Augmented Generation (RAG) and self-improvement loops.

Why Text-to-SQL Matters

Accessibility: Non-technical users can interact with databases using natural language, reducing bottlenecks.
Efficiency: Data professionals can prototype queries faster by describing what they need in plain English.
Integration: Enables intuitive chatbot and application interfaces for database interactions.
Complexity: LLMs can generate sophisticated queries (joins, subqueries, aggregations) that are time-consuming to write manually.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date TEXT,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 2, 72000, '2019-07-22'),
    (3, 'Charlie Brown', 1, 88000, '2021-01-10'),
    (4, 'Diana Prince', 3, 65000, '2022-06-01')
])
conn.commit()

Creating a Basic Text-to-SQL Prompt

A good prompt must include three things: clear instructions, the user's query, and the database schema. Here's a minimal implementation:

def get_schema(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    return '\n'.join(row[0] for row in cursor.fetchall())
def generate_prompt_basic(query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {query}
SQL Query:"""
schema = get_schema(conn)
prompt = generate_prompt_basic("Show me all employees in the Engineering department", schema)
print(prompt)

Now call Claude via the API:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    messages=[{"role": "user", "content": prompt}]
)
sql_query = response.content[0].text
print("Generated SQL:", sql_query)
Execute and verify
cursor.execute(sql_query)
results = cursor.fetchall()
print("Results:", results)

Improving the Prompt with Examples (Few-Shot Learning)

Including examples in your prompt helps Claude understand the expected output format and reasoning pattern. This is called few-shot learning.

def generate_prompt_fewshot(query, schema):
    examples = """
Example 1:
User Query: List all departments located in New York
SQL Query: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User Query: What is the average salary of employees in each department?
SQL Query: SELECT d.name, AVG(e.salary) FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
"""
    return f"""You are a SQL expert. Convert natural language to SQL.
Database Schema:
{schema}
{examples}
User Query: {query}
SQL Query:"""

Pro tip: You can also include a few rows of actual data in the prompt to give Claude more context about the data structure and content. This often improves accuracy for queries involving specific values or patterns.

Using Chain-of-Thought Prompting

For complex queries, chain-of-thought (CoT) prompting encourages Claude to reason step-by-step before generating the final SQL. Using XML tags helps structure the reasoning process.

def generate_prompt_cot(query, schema):
    return f"""You are a SQL expert. Convert the user's query to SQL by reasoning step-by-step.
Database Schema:
{schema}
User Query: {query}
<reasoning>
Think through the query step by step:
Identify the tables needed
Determine the columns to select
Identify any joins, filters, or aggregations
Write the SQL query
</reasoning>
<sql>
Write the final SQL query here.
</sql>"""

This approach is especially useful for queries that require multiple joins, subqueries, or complex WHERE clauses. Claude's reasoning will be captured in the <reasoning> tags, while the final SQL goes in <sql> tags.

Implementing RAG for Complex Database Schemas

When your database has dozens or hundreds of tables, including the entire schema in every prompt is impractical. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information based on the user's query.

First, build a simple vector store to index table and column descriptions:

import voyageai
class SimpleVectorDB:
    def __init__(self, api_key):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.metadata = []
    
    def add_document(self, text, metadata):
        embedding = self.client.embed([text], model="voyage-2").embeddings[0]
        self.embeddings.append(embedding)
        self.metadata.append(metadata)
    
    def search(self, query, top_k=3):
        query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
        # Compute cosine similarity (simplified)
        scores = [
            sum(a*b for a,b in zip(query_embedding, emb))
            for emb in self.embeddings
        ]
        top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
        return [self.metadata[i] for i in top_indices]
Index your schema
vdb = SimpleVectorDB(api_key="your-voyage-api-key")
vdb.add_document("employees table: id, name, department_id, salary, hire_date", {"table": "employees"})
vdb.add_document("departments table: id, name, location", {"table": "departments"})
Retrieve relevant schema for a query
query = "Show me all employees in Engineering"
relevant = vdb.search(query, top_k=2)
relevant_schema = "\n".join([r["table"] for r in relevant])

Then modify your prompt to use only the retrieved schema:

def generate_prompt_rag(query, relevant_schema):
    return f"""You are a SQL expert. Use only the following relevant schema to answer the query.
Relevant Schema:
{relevant_schema}
User Query: {query}
SQL Query:"""

Implementing Query Self-Improvement

One of Claude's most powerful capabilities is the ability to execute its own SQL, analyze the results or errors, and improve the query. This self-improvement loop dramatically increases reliability.

def text_to_sql_with_self_improvement(conn, query, max_iterations=3):
    schema = get_schema(conn)
    prompt = generate_prompt_basic(query, schema)
    
    for i in range(max_iterations):
        # Generate SQL
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )
        sql = response.content[0].text
        
        # Try to execute
        try:
            cursor = conn.cursor()
            cursor.execute(sql)
            results = cursor.fetchall()
            return sql, results  # Success!
        except Exception as e:
            error_msg = str(e)
            # Feed the error back to Claude for improvement
            prompt = f"""The previous SQL query generated an error. Please fix it.
Original Query: {query}
Database Schema:
{schema}
Previous SQL: {sql}
Error: {error_msg}
Please provide a corrected SQL query:"""
    
    return None, f"Failed after {max_iterations} iterations"

This technique handles:

Syntax errors (missing commas, wrong keywords)
Schema mismatches (wrong table or column names)
Logic errors (incorrect joins or aggregations)

Evaluations

To ensure your Text-to-SQL system is reliable, build an evaluation pipeline:

Create a test set of 20-50 natural language queries with expected SQL outputs.
Measure accuracy by comparing generated SQL to expected SQL (exact match or execution match).
Track error rates across different query types (simple, joins, aggregations, subqueries).
Iterate on your prompts based on failure patterns.

def evaluate(conn, test_cases):
    correct = 0
    for query, expected_sql in test_cases:
        generated_sql, _ = text_to_sql_with_self_improvement(conn, query)
        if generated_sql and generated_sql.strip() == expected_sql.strip():
            correct += 1
    print(f"Accuracy: {correct}/{len(test_cases)} ({100*correct/len(test_cases):.1f}%)")

Key Takeaways

Start simple: Begin with a clear prompt that includes the schema, instructions, and user query. Add examples (few-shot learning) for better consistency.
Use chain-of-thought: For complex queries, have Claude reason step-by-step using XML tags before generating SQL. This improves accuracy on multi-step queries.
Implement RAG for large schemas: When your database has many tables, dynamically retrieve only the relevant schema information to keep prompts concise and focused.
Build a self-improvement loop: Let Claude execute its own SQL, catch errors, and fix them. This dramatically increases reliability in production.
Evaluate continuously: Create a test set of query-SQL pairs and measure accuracy. Use failure patterns to refine your prompts and techniques.