Guide2026-05-06

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Imagine giving your non-technical team members the ability to query complex databases using plain English. No SQL training required. That's the promise of Text-to-SQL systems, and with Claude's advanced language understanding, building one is more accessible than ever.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Before diving into the code, let's understand why this capability is transformative:

Democratize data access: Business analysts, product managers, and executives can query databases without waiting for engineering support
Accelerate prototyping: Data scientists can quickly explore data using natural language before refining SQL
Build intuitive interfaces: Integrate natural language querying into chatbots, dashboards, and internal tools
Handle complexity: Claude can generate SQL with multiple joins, subqueries, and aggregations that would take humans significant time to write

Prerequisites

To follow along, you'll need:

An Anthropic API key
Python 3.8+
Basic familiarity with SQL and Python

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments (id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10'),
    (5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()

Step 1: Building a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt must include three elements:

Clear instructions
The database schema
The user's natural language query

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def get_schema():
    """Extract schema information from the database."""
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return "\n".join([s[0] for s in schemas])
def basic_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
schema = get_schema()
response = client.messages.create(
    model="claude-3-sonnet-20241022",
    max_tokens=200,
    messages=[{"role": "user", "content": basic_prompt("Show me all employees in Engineering", schema)}]
)
print(response.content[0].text)
Output: SELECT * FROM employees WHERE department_id = 1;

Step 2: Improving with Few-Shot Examples

Basic prompts work, but they can struggle with ambiguity. Adding examples—a technique called few-shot learning—dramatically improves accuracy.

def few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
Example 3:
User: Who was hired most recently?
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
    
    return f"""You are a SQL expert. Convert natural language queries to SQL.
{schema}
Here are examples of how to convert queries:
{examples}
User Query: {user_query}
SQL:"""

Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data types and formats, reducing errors with date parsing or string matching.

Step 3: Chain-of-Thought Prompting for Complex Queries

For multi-step queries, chain-of-thought prompting helps Claude reason through the problem before generating SQL. Use XML tags to structure the reasoning process.

def chain_of_thought_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the user's query to SQL.
Database Schema:
{schema}
User Query: {user_query}
Let's think through this step by step:
<reasoning>
What tables do I need?
What columns should I select?
What conditions (WHERE clauses) apply?
Do I need aggregations (GROUP BY, HAVING)?
What order should results be in?
</reasoning>
<sql>
-- Write your SQL here
</sql>"""

This approach is particularly effective for queries like "Find the department with the highest average salary" or "Show me employees hired in the last year along with their department names."

Step 4: Implementing RAG for Large Schemas

Real-world databases often have hundreds of tables. Including the entire schema in every prompt wastes tokens and confuses the model. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only relevant schema information.

import voyageai
from typing import List, Dict
import numpy as np
class SchemaRetriever:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.table_info = []
    
    def index_schema(self, conn):
        """Index all tables and their schemas."""
        cursor = conn.cursor()
        cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
        tables = cursor.fetchall()
        
        for (table_name,) in tables:
            cursor.execute(f"PRAGMA table_info({table_name})")
            columns = cursor.fetchall()
            
            # Create a description of this table
            col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns])
            description = f"Table '{table_name}' with columns: {col_desc}"
            
            self.table_info.append({
                "name": table_name,
                "description": description,
                "columns": columns
            })
            
            # Generate embedding
            result = self.client.embed([description], model="voyage-2")
            self.embeddings.append(result.embeddings[0])
    
    def retrieve_relevant_tables(self, query: str, top_k: int = 3) -> List[Dict]:
        """Find the most relevant tables for a given query."""
        query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
        
        # Compute cosine similarity
        similarities = [
            np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            for emb in self.embeddings
        ]
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        
        return [self.table_info[i] for i in top_indices]
Usage
retriever = SchemaRetriever(api_key="your-voyage-api-key")
retriever.index_schema(conn)
relevant_tables = retriever.retrieve_relevant_tables("Show me employee salaries")
for table in relevant_tables:
    print(table["description"])

Now, instead of passing the entire schema, you dynamically build your prompt with only the relevant tables:

def rag_prompt(user_query, retriever):
    relevant = retriever.retrieve_relevant_tables(user_query)
    schema_context = "\n".join([t["description"] for t in relevant])
    
    return f"""Given these database tables:
{schema_context}
Convert this query to SQL: {user_query}
SQL:"""

Step 5: Building a Self-Improvement Loop

Even the best prompts can produce incorrect SQL. A self-improvement loop lets Claude execute its generated SQL, catch errors, and fix them automatically.

def execute_and_improve(user_query, conn, max_iterations=3):
    """Generate SQL, execute it, and improve if errors occur."""
    schema = get_schema()
    
    for iteration in range(max_iterations):
        # Generate SQL
        response = client.messages.create(
            model="claude-3-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": basic_prompt(user_query, schema)}]
        )
        sql = response.content[0].text.strip()
        
        print(f"Iteration {iteration + 1}: Generated SQL: {sql}")
        
        try:
            cursor = conn.cursor()
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # If we get here, SQL executed successfully
            if len(results) == 0:
                # Check if results are empty but query should return data
                verification_prompt = f"""The SQL query '{sql}' returned no results for the user query '{user_query}'. 
                Schema: {schema}
                Is this expected? If not, suggest a corrected SQL query."""
                
                response = client.messages.create(
                    model="claude-3-sonnet-20241022",
                    max_tokens=300,
                    messages=[{"role": "user", "content": verification_prompt}]
                )
                
                # Check if Claude suggests a fix
                if "corrected" in response.content[0].text.lower():
                    sql = response.content[0].text.split("SQL:")[-1].strip()
                    continue
            
            return {"success": True, "sql": sql, "results": results}
            
        except Exception as e:
            error_msg = str(e)
            print(f"Error: {error_msg}")
            
            # Feed error back to Claude for fixing
            fix_prompt = f"""The SQL query you generated had an error:
User Query: {user_query}
Your SQL: {sql}
Error: {error_msg}
Database Schema:
{schema}
Please provide a corrected SQL query."""
            
            response = client.messages.create(
                model="claude-3-sonnet-20241022",
                max_tokens=300,
                messages=[{"role": "user", "content": fix_prompt}]
            )
            schema = response.content[0].text  # Update schema with corrected query context
    
    return {"success": False, "error": "Max iterations reached without successful execution"}
Test with a query that might cause errors
result = execute_and_improve("Show me employees with salary above average", conn)
print(result)

This self-improvement loop handles:

Syntax errors: Misspelled table names, wrong column references
Logical errors: Queries that run but return unexpected results
Empty results: Queries that should return data but don't

Evaluation: Measuring Your System's Performance

To ensure your Text-to-SQL system is reliable, you need to evaluate it systematically:

Execution Accuracy: Does the generated SQL run without errors?
Result Correctness: Does the output match expected results?
Semantic Equivalence: Is the SQL logically equivalent to the user's intent?

Create a test suite with known query-SQL pairs and track your system's accuracy over time as you refine prompts and techniques.

test_cases = [
    {"query": "List all employees", "expected_sql": "SELECT * FROM employees"},
    {"query": "Show me departments in San Francisco", "expected_sql": "SELECT * FROM departments WHERE location = 'San Francisco'"},
    # Add more test cases
]
def evaluate_system(test_cases, generate_sql_fn):
    correct = 0
    for case in test_cases:
        generated = generate_sql_fn(case["query"])
        if generated.strip().lower() == case["expected_sql"].strip().lower():
            correct += 1
    return correct / len(test_cases)

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt and schema, then layer in few-shot examples, chain-of-thought reasoning, and RAG as needed
RAG is essential for production: For databases with more than 10-15 tables, dynamic schema retrieval dramatically improves accuracy and reduces token costs
Self-improvement loops catch errors: Let Claude execute its own SQL and fix mistakes automatically—this is crucial for production reliability
Include sample data in prompts: A few rows of real data help Claude understand data formats and avoid common pitfalls with dates, strings, and NULL values
Evaluate systematically: Build a test suite of query-SQL pairs and track accuracy metrics to guide your prompt engineering efforts

Building a Text-to-SQL system with Claude is an iterative process. Start with the techniques in this guide, test with your own database, and refine based on real-world usage. The result is a powerful tool that makes data accessible to everyone in your organization.