GuideBeginnerBest Practices2026-05-15

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn to convert natural language to SQL using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks you through building a production-ready Text-to-SQL system using Claude. You'll learn effective prompting techniques, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.

Text-to-SQLPrompt EngineeringRAGDatabaseClaude API

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Imagine giving your non-technical team members the ability to query complex databases just by asking questions in plain English. That's the promise of Text-to-SQL, and with Claude's advanced language understanding, it's more achievable than ever.

This guide walks you through building a production-ready Text-to-SQL system using the Claude API. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Text-to-SQL bridges the gap between human intent and database queries. Here's why it's transformative:

Democratizes data access: Business analysts, product managers, and executives can query databases without learning SQL syntax.
Accelerates prototyping: Data scientists can quickly explore data without writing complex joins from scratch.
Enables conversational interfaces: Chatbots and virtual assistants can answer data-driven questions in real time.
Handles complexity: LLMs like Claude can generate multi-table joins, subqueries, and aggregations that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3
conn = sqlite3.connect('company.db')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE IF NOT EXISTS departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice', 1, 95000, '2020-03-15'),
    (2, 'Bob', 1, 85000, '2021-06-01'),
    (3, 'Charlie', 2, 72000, '2019-11-20'),
    (4, 'Diana', 3, 68000, '2022-01-10'),
    (5, 'Eve', 2, 78000, '2020-08-05')
])
conn.commit()

Creating a Basic Text-to-SQL Prompt

A good prompt needs three things: clear instructions, the user's query, and the database schema. Let's build a basic version.

def get_schema():
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return '\n'.join([s[0] for s in schemas])
def basic_prompt(user_query):
    schema = get_schema()
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQLite SQL query.
Database Schema:
{schema}
User Query: {user_query}
SQL Query:"""

Now let's use the Claude API:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    messages=[{
        "role": "user",
        "content": basic_prompt("Show me all employees in Engineering")
    }]
)
print(response.content[0].text)
Output: SELECT * FROM employees WHERE department_id = 1;

Improving with Few-Shot Examples

Basic prompts work, but adding examples (few-shot learning) dramatically improves accuracy, especially for complex queries.

def few_shot_prompt(user_query):
    schema = get_schema()
    examples = """
Examples:
User: List all departments
SQL: SELECT * FROM departments;
User: Show employees hired after 2020
SQL: SELECT * FROM employees WHERE hire_date > '2020-01-01';
User: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
"""
    return f"""You are a SQL expert. Given the schema and examples, convert the user's query to SQL.
Schema:
{schema}
{examples}
User: {user_query}
SQL:"""

Pro tip: Including a few rows of actual data in your prompt can give Claude valuable context about data types and values, leading to more accurate queries.

Chain-of-Thought Prompting for Complex Queries

For multi-step queries, chain-of-thought (CoT) prompting helps Claude reason through the problem. Use XML tags to structure the reasoning process.

def cot_prompt(user_query):
    schema = get_schema()
    return f"""You are a SQL expert. Convert the user's query to SQL by reasoning step by step.
Schema:
{schema}
User Query: {user_query}
<reasoning>
Identify the tables needed
Determine the columns to select
Identify any joins, filters, or aggregations
Construct the SQL query
</reasoning>
SQL:"""

This approach is particularly effective for queries like "Which department has the highest average salary?" where Claude needs to think about grouping, aggregation, and ordering.

Implementing RAG for Large Schemas

Real-world databases can have hundreds of tables. Including the entire schema in every prompt wastes tokens and confuses the model. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema.

First, create embeddings for your schema:

import voyageai
vo = voyageai.Client()
def embed_schema():
    tables = ['departments', 'employees']
    embeddings = {}
    for table in tables:
        cursor.execute(f"PRAGMA table_info({table})")
        columns = cursor.fetchall()
        schema_text = f"Table: {table}\n" + "\n".join([f"- {col[1]} ({col[2]})" for col in columns])
        response = vo.embed([schema_text], model="voyage-2")
        embeddings[table] = response.embeddings[0]
    return embeddings

Then, retrieve the most relevant tables for each query:

def retrieve_relevant_schema(user_query, embeddings):
    query_embedding = vo.embed([user_query], model="voyage-2").embeddings[0]
    
    # Simple cosine similarity
    from sklearn.metrics.pairwise import cosine_similarity
    import numpy as np
    
    scores = {}
    for table, emb in embeddings.items():
        scores[table] = cosine_similarity([query_embedding], [emb])[0][0]
    
    # Return top 2 most relevant tables
    relevant = sorted(scores, key=scores.get, reverse=True)[:2]
    return relevant

This approach scales to hundreds of tables and keeps your prompts focused and efficient.

Building a Self-Improvement Loop

One of Claude's most powerful capabilities is self-correction. By executing the generated SQL and feeding errors back to Claude, you can create a self-improvement loop.

def execute_with_retry(sql_query, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            cursor.execute(sql_query)
            results = cursor.fetchall()
            return {"success": True, "results": results}
        except Exception as e:
            if attempt == max_attempts - 1:
                return {"success": False, "error": str(e)}
            
            # Ask Claude to fix the query
            fix_prompt = f"""The following SQL query generated an error:
Query: {sql_query}
Error: {str(e)}
Please fix the query:"""
            
            response = client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=200,
                messages=[{"role": "user", "content": fix_prompt}]
            )
            sql_query = response.content[0].text

This loop handles:

Syntax errors (missing commas, wrong keywords)
Schema mismatches (wrong table or column names)
Logic errors (incorrect joins or aggregations)

Evaluating Your System

To ensure quality, implement automated evaluations:

Exact match: Compare generated SQL against ground truth queries
Execution match: Run both queries and compare result sets
Semantic similarity: Use embeddings to compare query intent

def evaluate_query(generated_sql, expected_sql):
    cursor.execute(generated_sql)
    gen_results = set(cursor.fetchall())
    
    cursor.execute(expected_sql)
    exp_results = set(cursor.fetchall())
    
    return gen_results == exp_results

Key Takeaways

Start simple, iterate fast: Begin with basic prompts and add few-shot examples, chain-of-thought, and RAG as needed.
RAG is essential for scale: For databases with more than 10-20 tables, dynamic schema retrieval dramatically improves accuracy and reduces token usage.
Self-improvement loops catch errors: Let Claude execute and fix its own SQL. This catches syntax errors and logical mistakes before they reach users.
Include data samples in prompts: A few rows of real data help Claude understand value formats and relationships.
Evaluate rigorously: Use execution match (comparing result sets) rather than just string matching to validate query correctness.