Guide2026-05-06

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops for accurate SQL generation.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase Integration

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Imagine giving your non-technical team members the ability to query complex databases just by asking questions in plain English. That's the promise of Text-to-SQL, and Claude is exceptionally good at it. This guide walks you through building a production-ready Text-to-SQL system, from basic prompts to advanced self-improvement loops.

Why Text-to-SQL Matters

Text-to-SQL converts natural language questions into structured SQL queries. It's a game-changer for organizations because:

Democratizes data access: Business analysts, product managers, and executives can query databases without knowing SQL syntax
Boosts productivity: Data analysts can prototype complex queries in seconds instead of minutes
Enables intuitive interfaces: Power chatbots and internal tools with natural language database interaction
Handles complexity: Claude can generate queries with multiple JOINs, subqueries, and aggregations that would take humans significant time to write correctly

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10')
])
conn.commit()

Step 1: The Basic Text-to-SQL Prompt

A good Text-to-SQL prompt needs three things: clear instructions, the database schema, and the user's question. Here's a minimal but effective approach:

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def generate_sql(user_query, schema):
    prompt = f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL statement.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text
Example usage
schema = """
Table: departments
Columns: id (INTEGER), name (TEXT), location (TEXT)
Table: employees
Columns: id (INTEGER), name (TEXT), department_id (INTEGER), salary (REAL), hire_date (DATE)
Foreign Key: department_id -> departments.id
"""
query = "Show me all employees in the Engineering department"
sql = generate_sql(query, schema)
print(sql)
Output: SELECT * FROM employees WHERE department_id = 1;

Step 2: Improving with Few-Shot Examples

Basic prompts work, but adding examples (few-shot learning) dramatically improves accuracy, especially for complex queries. Let's enhance our function:

def generate_sql_with_examples(user_query, schema):
    examples = """
Examples:
User: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
User: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary 
FROM employees e 
JOIN departments d ON e.department_id = d.id 
GROUP BY d.name;
User: Who was hired most recently?
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
    
    prompt = f"""You are a SQL expert. Given the schema and examples below, convert the user's query to SQL.
{schema}
{examples}
User Query: {user_query}
SQL:"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data formats and values, leading to more accurate queries.

Step 3: Chain-of-Thought for Complex Queries

For multi-step queries, chain-of-thought prompting helps Claude reason through the problem. Use XML tags to structure the reasoning:

def generate_sql_cot(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the user's query to SQL using step-by-step reasoning.
{schema}
User Query: {user_query}
<reasoning>
Identify the tables needed
Determine the columns to select
Identify JOIN conditions
Apply WHERE filters
Add GROUP BY, ORDER BY, or LIMIT if needed
</reasoning>
<sql>
-- Your SQL here
</sql>"""
    
    response = client.messages.create(
        model="claude-3-sonnet-20241022",
        max_tokens=800,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

This approach significantly reduces errors on queries involving multiple JOINs, subqueries, or complex aggregations.

Step 4: RAG for Large Database Schemas

When your database has dozens or hundreds of tables, you can't fit the entire schema in a prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

Here's how to implement it with VoyageAI embeddings:

import voyageai
import numpy as np
from typing import List, Dict
class SchemaVectorDB:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.schema_chunks = []
    
    def index_schema(self, tables: Dict[str, Dict]):
        """Index table schemas as embedding vectors"""
        for table_name, columns in tables.items():
            chunk = f"Table: {table_name}\n"
            for col_name, col_type in columns.items():
                chunk += f"  - {col_name} ({col_type})\n"
            self.schema_chunks.append(chunk)
        
        # Generate embeddings
        response = self.client.embed(
            self.schema_chunks, 
            model="voyage-2"
        )
        self.embeddings = response.embeddings
    
    def query_relevant_schema(self, user_query: str, top_k: int = 3) -> str:
        """Retrieve most relevant schema chunks for a query"""
        query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0]
        
        # Calculate cosine similarity
        similarities = [
            np.dot(query_embedding, emb) / 
            (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            for emb in self.embeddings
        ]
        
        # Get top-k most relevant chunks
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        relevant_schema = "\n".join([self.schema_chunks[i] for i in top_indices])
        
        return relevant_schema
Usage
vector_db = SchemaVectorDB(api_key="your-voyage-api-key")
vector_db.index_schema({
    "employees": {"id": "INTEGER", "name": "TEXT", "department_id": "INTEGER", "salary": "REAL"},
    "departments": {"id": "INTEGER", "name": "TEXT", "location": "TEXT"}
})
relevant_schema = vector_db.query_relevant_schema("What's the average salary?")

Step 5: Self-Improvement Loop

This is where Claude truly shines. Instead of just generating SQL, let Claude execute it, check for errors, and fix its own mistakes:

def generate_and_refine_sql(user_query, schema, conn, max_iterations=3):
    current_sql = generate_sql(user_query, schema)
    
    for i in range(max_iterations):
        try:
            cursor = conn.cursor()
            cursor.execute(current_sql)
            results = cursor.fetchall()
            
            # Check if results make sense
            verification_prompt = f"""
Original Query: {user_query}
Generated SQL: {current_sql}
Results: {results[:5]}...
Does this SQL correctly answer the user's query? If not, explain what's wrong.
"""
            
            response = client.messages.create(
                model="claude-3-sonnet-20241022",
                max_tokens=300,
                messages=[{"role": "user", "content": verification_prompt}]
            )
            
            if "correct" in response.content[0].text.lower():
                return current_sql, results
            
        except Exception as e:
            # SQL execution error - ask Claude to fix it
            error_prompt = f"""
The following SQL generated an error:
{current_sql}
Error: {str(e)}
Schema: {schema}
Original Query: {user_query}
Please fix the SQL."""
            
            response = client.messages.create(
                model="claude-3-sonnet-20241022",
                max_tokens=500,
                messages=[{"role": "user", "content": error_prompt}]
            )
            current_sql = response.content[0].text
    
    return current_sql, None

This self-improvement loop catches syntax errors, logical mistakes, and even performance issues before they reach your users.

Evaluating Your System

To ensure reliability, build an evaluation pipeline:

Create a test set of 20-50 query-SQL pairs
Measure accuracy by comparing generated SQL to expected SQL
Track execution success rate - does the SQL run without errors?
Monitor result correctness - does the output match expected results?

def evaluate_system(test_cases, conn):
    correct = 0
    for case in test_cases:
        generated_sql = generate_sql(case["query"], case["schema"])
        try:
            cursor = conn.cursor()
            cursor.execute(generated_sql)
            results = cursor.fetchall()
            if results == case["expected_results"]:
                correct += 1
        except:
            pass
    
    accuracy = correct / len(test_cases) * 100
    return f"Accuracy: {accuracy:.1f}%"

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows
RAG is essential for real-world databases: When schemas exceed prompt limits, use vector embeddings to dynamically retrieve relevant schema information
Self-improvement loops catch errors: Let Claude execute, verify, and fix its own SQL queries for dramatically better reliability
Include sample data in prompts: A few rows of real data help Claude understand data formats and produce more accurate queries
Always evaluate: Build a test set of query-SQL pairs and measure accuracy, execution success, and result correctness to track improvements over time