Guide2026-04-22

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most impactful applications of large language models in enterprise settings. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back.

Claude excels at this task because it understands context, interprets complex queries, and generates accurate SQL statements. This guide walks through building a production-ready Text-to-SQL system, from basic prompting to advanced techniques like RAG and self-improvement loops.

Why Text-to-SQL Matters

Before diving into implementation, let's understand why this capability is valuable:

Accessibility: Non-technical team members can query databases without learning SQL syntax
Efficiency: Data analysts can prototype queries faster using natural language
Integration: Enables intuitive database interfaces in chatbots and applications
Complexity: LLMs can generate multi-join queries, subqueries, and aggregations that take humans significant time to write

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our playground throughout the guide.

import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments (id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Sales', 'San Francisco'),
    (3, 'Marketing', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 2, 75000, '2019-07-01'),
    (3, 'Charlie Brown', 1, 88000, '2021-01-10'),
    (4, 'Diana Prince', 3, 82000, '2022-06-20')
])
conn.commit()

Creating a Basic Text-to-SQL Prompt

A good Text-to-SQL prompt needs three elements: clear instructions, the user's query, and the database schema. Here's a minimal implementation:

import anthropic
client = anthropic.Anthropic()
def get_schema(cursor):
    """Extract schema information from the database."""
    schema_parts = []
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    
    for table in tables:
        table_name = table[0]
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        col_defs = [f"{col[1]} ({col[2]})" for col in columns]
        schema_parts.append(f"Table: {table_name}\nColumns: {', '.join(col_defs)}")
    
    return '\n\n'.join(schema_parts)
def generate_sql(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text
Test it
schema = get_schema(cursor)
query = "Show me all employees in the Engineering department"
sql = generate_sql(query, schema)
print(f"Generated SQL: {sql}")
Execute and verify
cursor.execute(sql)
results = cursor.fetchall()
print(f"Results: {results}")

Improving with Few-Shot Examples

Basic prompts work, but including examples (few-shot learning) dramatically improves accuracy, especially for complex queries:

def generate_prompt_with_examples(user_query, schema):
    examples = """
Examples:
Query: "List all departments"
SQL: SELECT * FROM departments;
Query: "Find employees hired after 2020"
SQL: SELECT * FROM employees WHERE hire_date > '2020-01-01';
Query: "Show average salary by department"
SQL: SELECT d.name, AVG(e.salary) as avg_salary 
     FROM employees e 
     JOIN departments d ON e.department_id = d.id 
     GROUP BY d.name;
"""
    
    prompt = f"""You are a SQL expert. Convert natural language queries to SQL.
{schema}
{examples}
User Query: {user_query}
SQL:"""
    return prompt

Pro tip: Including a few rows of actual data alongside the schema can give Claude more context about data structure and content, leading to better results.

Chain-of-Thought Prompting for Complex Queries

For queries requiring multiple operations, chain-of-thought (CoT) prompting helps Claude reason step-by-step:

def generate_cot_prompt(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the user's query to SQL using step-by-step reasoning.
Database Schema:
{schema}
User Query: {user_query}
<thinking>
Let me break down what this query needs:
Identify the tables involved
Determine the columns needed
Identify any joins, filters, or aggregations
Construct the final SQL
</thinking>
SQL:"""
    return prompt

This approach is particularly effective for queries like "Which department has the highest average salary and what is it?" where Claude needs to reason about joins, aggregations, and ordering.

Implementing RAG for Complex Database Schemas

Real-world databases can have hundreds of tables. Including the entire schema in every prompt is impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only relevant schema information.

First, set up a vector database for schema embeddings:

import voyageai
vo = voyageai.Client()
def build_schema_embeddings(cursor):
    """Create embeddings for each table in the database."""
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    
    table_descriptions = []
    for table in tables:
        table_name = table[0]
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns])
        table_descriptions.append(f"Table {table_name}: {col_desc}")
    
    # Generate embeddings
    embeddings = vo.embed(table_descriptions, model="voyage-2").embeddings
    return list(zip(table_descriptions, embeddings))
def retrieve_relevant_schema(query, schema_embeddings, top_k=2):
    """Retrieve the most relevant tables for a given query."""
    query_embedding = vo.embed([query], model="voyage-2").embeddings[0]
    
    # Simple cosine similarity (use proper vector DB in production)
    similarities = []
    for desc, emb in schema_embeddings:
        similarity = sum(a*b for a,b in zip(query_embedding, emb))
        similarities.append((similarity, desc))
    
    similarities.sort(reverse=True)
    return "\n\n".join([desc for _, desc in similarities[:top_k]])

Now integrate RAG into your prompt generation:

def generate_sql_with_rag(user_query, schema_embeddings):
    relevant_schema = retrieve_relevant_schema(user_query, schema_embeddings)
    
    prompt = f"""You are a SQL expert. Convert the user's query to SQL.
Relevant Database Schema:
{relevant_schema}
User Query: {user_query}
SQL:"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Building a Self-Improvement Loop

One of Claude's most powerful capabilities is self-correction. You can implement a loop where Claude executes its generated SQL, analyzes errors or results, and improves the query:

def text_to_sql_with_self_improvement(user_query, schema, cursor, max_iterations=3):
    """Generate SQL and iteratively improve it based on execution results."""
    
    # Initial generation
    sql = generate_sql(user_query, schema)
    
    for i in range(max_iterations):
        try:
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # Check if results make sense
            validation_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Results: {results}
Are these results correct for the original query? If not, provide corrected SQL.
If correct, respond with "CORRECT".
"""
            
            validation = client.messages.create(
                model="claude-3-sonnet-20241022",
                max_tokens=200,
                messages=[{"role": "user", "content": validation_prompt}]
            )
            
            response_text = validation.content[0].text
            
            if "CORRECT" in response_text:
                return sql, results
            else:
                # Extract improved SQL
                sql = response_text
                
        except Exception as e:
            # SQL execution error - ask Claude to fix it
            error_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Error: {str(e)}
Fix the SQL query to resolve this error.
"""
            
            fix_response = client.messages.create(
                model="claude-3-sonnet-20241022",
                max_tokens=200,
                messages=[{"role": "user", "content": error_prompt}]
            )
            sql = fix_response.content[0].text
    
    return sql, None

This self-improvement loop handles:

Syntax errors: Claude can fix malformed SQL
Logic errors: Claude can adjust queries that return unexpected results
Schema mismatches: Claude can adapt when it misremembers column names

Evaluation Framework

To ensure your Text-to-SQL system is reliable, implement a systematic evaluation:

def evaluate_text_to_sql(test_cases, cursor):
    """Evaluate system accuracy against a set of test cases."""
    results = []
    
    for test in test_cases:
        natural_query = test["query"]
        expected_sql = test["expected_sql"]
        
        generated_sql = generate_sql(natural_query, get_schema(cursor))
        
        # Compare results
        cursor.execute(generated_sql)
        generated_results = set(cursor.fetchall())
        
        cursor.execute(expected_sql)
        expected_results = set(cursor.fetchall())
        
        is_correct = generated_results == expected_results
        results.append({
            "query": natural_query,
            "correct": is_correct,
            "generated_sql": generated_sql,
            "expected_sql": expected_sql
        })
    
    accuracy = sum(r["correct"] for r in results) / len(results)
    return accuracy, results

Key Takeaways

Start simple, iterate fast: Begin with basic prompting and schema inclusion, then layer in few-shot examples, chain-of-thought, and RAG as complexity grows
RAG is essential for production: For databases with more than 5-10 tables, dynamic schema retrieval is necessary for both accuracy and cost efficiency
Self-improvement loops catch errors: Claude's ability to analyze and fix its own SQL output dramatically improves reliability in production systems
Include data samples: Adding a few rows of actual data alongside schema information helps Claude generate more accurate queries
Evaluate systematically: Build a test suite of natural language queries with expected SQL outputs to track performance regressions as you iterate