GuideBeginnerBest Practices2026-05-15

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for complex schemas, and self-improvement loops to generate accurate SQL from natural language queries.

Text-to-SQLPrompt EngineeringRAGDatabaseAPI

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most transformative applications of large language models. It bridges the gap between natural language and structured data, allowing anyone in an organization to query databases without knowing SQL syntax. Claude excels at this task thanks to its strong reasoning capabilities, context understanding, and ability to generate accurate, complex SQL statements.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand why this capability is so valuable:

Accessibility: Non-technical stakeholders can query databases directly, reducing bottlenecks with data teams.
Efficiency: Data analysts can prototype queries in seconds rather than minutes.
Integration: Chatbots and internal tools can offer natural language database interfaces.
Complexity: LLMs can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create an in-memory database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
Create departments table
cursor.execute('''
    CREATE TABLE departments (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        location TEXT
    )
''')
Create employees table
cursor.execute('''
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        salary REAL,
        hire_date DATE,
        FOREIGN KEY (department_id) REFERENCES departments(id)
    )
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Sales', 'San Francisco'),
    (3, 'Marketing', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10'),
    (5, 'Eve Davis', 2, 65000, '2023-07-05')
])
conn.commit()

Step 1: Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt must include:

Clear instructions for the task
The user's natural language query
The database schema so Claude understands the table structures

def get_schema_string(cursor):
    """Extract schema information from the database."""
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    
    schema_parts = []
    for (table_name,) in tables:
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        col_defs = [f"  - {col[1]} ({col[2]})" for col in columns]
        schema_parts.append(f"Table: {table_name}\n" + "\n".join(col_defs))
    
    return "\n\n".join(schema_parts)
schema = get_schema_string(cursor)
print(schema)

Now let's create our basic prompt function and test it with Claude:

import anthropic
client = anthropic.Anthropic()
def generate_basic_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, nothing else."""
Test it
user_query = "Show me all employees in the Engineering department"
prompt = generate_basic_prompt(user_query, schema)
response = client.messages.create(
    model="claude-3-sonnet-20241022",
    max_tokens=200,
    messages=[{"role": "user", "content": prompt}]
)
print(response.content[0].text)

Step 2: Improving with Few-Shot Examples

A basic prompt works, but few-shot learning dramatically improves accuracy. By providing examples of input-output pairs, Claude better understands the expected format and reasoning patterns.

def generate_few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User Query: "List all departments"
SQL: SELECT * FROM departments;
Example 2:
User Query: "Find the average salary by department"
SQL: SELECT d.name, AVG(e.salary) as avg_salary 
FROM employees e 
JOIN departments d ON e.department_id = d.id 
GROUP BY d.name;
Example 3:
User Query: "Who was hired most recently?"
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
    
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
Here are some examples:
{examples}
User Query: {user_query}
Generate only the SQL query, nothing else."""
Test with a more complex query
user_query = "Which department has the highest paid employee?"
prompt = generate_few_shot_prompt(user_query, schema)
response = client.messages.create(
    model="claude-3-sonnet-20241022",
    max_tokens=200,
    messages=[{"role": "user", "content": prompt}]
)
print(response.content[0].text)

Pro Tip: Including a few sample rows of actual data in your prompt can further improve accuracy by giving Claude concrete context about values and formats.

Step 3: Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, aggregations, or subqueries, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Using XML tags to structure the reasoning process is particularly effective.

def generate_cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
First, think through the problem step by step inside <reasoning> tags:
Identify which tables are needed
Determine the join conditions
Identify filtering criteria
Determine aggregation if needed
Plan the column selection and ordering

Then, output the final SQL query inside <sql> tags.
User Query: {user_query}
<reasoning>
"""
user_query = "Find the department with the most employees hired in 2020 or later"
prompt = generate_cot_prompt(user_query, schema)
response = client.messages.create(
    model="claude-3-sonnet-20241022",
    max_tokens=500,
    messages=[{"role": "user", "content": prompt}]
)
print(response.content[0].text)

Step 4: Implementing RAG for Large Schemas

Real-world databases often have hundreds of tables. Including the entire schema in every prompt wastes tokens and can confuse the model. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

First, build a simple vector store using VoyageAI embeddings:

import voyageai
import numpy as np
from typing import List, Dict
class SimpleVectorDB:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.metadata = []
    
    def add_documents(self, documents: List[str], metadata: List[Dict]):
        response = self.client.embed(documents, model="voyage-2")
        self.embeddings = response.embeddings
        self.metadata = metadata
    
    def query(self, query: str, top_k: int = 5) -> List[Dict]:
        query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
        similarities = [
            np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            for emb in self.embeddings
        ]
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        return [self.metadata[i] for i in top_indices]

Now, create schema chunks for each table and use RAG to retrieve only relevant ones:

def create_schema_chunks(cursor):
    chunks = []
    metadata = []
    
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    
    for (table_name,) in tables:
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        
        # Create a descriptive chunk for each table
        col_descriptions = []
        for col in columns:
            col_descriptions.append(f"{col[1]} ({col[2]})")
        
        chunk = f"Table {table_name} has columns: {', '.join(col_descriptions)}"
        chunks.append(chunk)
        metadata.append({"table": table_name, "schema": chunk})
    
    return chunks, metadata
Initialize vector DB and add schema chunks
chunks, metadata = create_schema_chunks(cursor)
vector_db = SimpleVectorDB(api_key="your-voyage-api-key")
vector_db.add_documents(chunks, metadata)
Retrieve relevant schema for a query
user_query = "Show me employee salaries by department"
relevant_tables = vector_db.query(user_query, top_k=2)
relevant_schema = "\n".join([t["schema"] for t in relevant_tables])
print(relevant_schema)

Step 5: Building a Self-Improvement Loop

One of Claude's most powerful capabilities is self-correction. By executing the generated SQL, catching errors, and feeding them back to Claude, you can build a system that iteratively improves its output.

def execute_sql(conn, sql_query):
    """Execute SQL and return results or error message."""
    try:
        cursor = conn.cursor()
        cursor.execute(sql_query)
        results = cursor.fetchall()
        columns = [description[0] for description in cursor.description]
        return {"success": True, "columns": columns, "rows": results}
    except Exception as e:
        return {"success": False, "error": str(e)}
def self_improving_query(conn, user_query, schema, max_iterations=3):
    """Generate, execute, and improve SQL queries iteratively."""
    
    prompt = f"""Generate a SQL query for: {user_query}
Schema:
{schema}
Output only the SQL query."""
    
    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-3-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )
        
        sql_query = response.content[0].text.strip()
        result = execute_sql(conn, sql_query)
        
        if result["success"]:
            return {
                "query": sql_query,
                "results": result,
                "iterations": iteration + 1
            }
        
        # Feed the error back to Claude for improvement
        prompt = f"""The previous SQL query failed with this error:
{result['error']}
Original query: {sql_query}
User request: {user_query}
Schema:
{schema}
Please fix the SQL query."""
    
    return {"error": "Failed after max iterations"}
Test the self-improvement loop
user_query = "Show departments with average salary above 80000"
result = self_improving_query(conn, user_query, schema)
if "results" in result:
    print(f"Generated Query: {result['query']}")
    print(f"Iterations: {result['iterations']}")
    print(f"Results: {result['results']['rows']}")
else:
    print(f"Error: {result['error']}")

Evaluation: Measuring Success

To build a production system, you need robust evaluation. Here's a simple framework:

Exact Match: Compare generated SQL to expected SQL (string equality)
Execution Match: Execute both queries and compare result sets
Semantic Match: Use Claude to evaluate whether two queries are semantically equivalent

def evaluate_query(generated_sql, expected_sql, conn):
    """Evaluate generated SQL against expected SQL."""
    
    # Execution match
    gen_result = execute_sql(conn, generated_sql)
    exp_result = execute_sql(conn, expected_sql)
    
    if not gen_result["success"]:
        return {"pass": False, "reason": "Generated query failed"}
    
    # Compare result sets (order-independent)
    gen_rows = set(tuple(row) for row in gen_result["rows"])
    exp_rows = set(tuple(row) for row in exp_result["rows"])
    
    return {
        "pass": gen_rows == exp_rows,
        "generated": generated_sql,
        "expected": expected_sql,
        "gen_results": gen_result["rows"],
        "exp_results": exp_result["rows"]
    }

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt and schema, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
RAG is essential for production: For databases with dozens or hundreds of tables, dynamic schema retrieval saves tokens and improves accuracy.
Self-improvement loops catch errors: Claude can analyze its own SQL output, identify mistakes, and correct them—use this to build robust systems.
Include sample data when possible: Adding a few rows of real data to your prompt gives Claude concrete context that improves query accuracy.
Evaluate systematically: Use execution match (comparing result sets) as your primary evaluation metric, as it's more robust than string matching.