Guide2026-05-05

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for large schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, from basic prompting to advanced techniques like RAG for large schemas, chain-of-thought reasoning, and self-correcting query loops. You'll learn practical patterns for accurate SQL generation.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most powerful applications of large language models. It bridges the gap between natural language and structured data, allowing anyone in your organization to query databases without knowing SQL syntax. Claude excels at this task due to its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Before diving into implementation, let's understand why this capability is so valuable:

Accessibility: Non-technical stakeholders can ask questions like "What were our top 5 products by revenue last quarter?" without waiting for a data team.
Efficiency: Data analysts can prototype complex queries in seconds instead of minutes.
Integration: Chatbots and internal tools can offer natural language database interfaces.
Complexity: Claude can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write correctly.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10'),
    (5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()

Step 1: Building a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt must include:

Clear instructions about what you want Claude to do
The user's natural language query
The database schema so Claude understands the available tables and columns

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def get_schema():
    """Extract schema from the database"""
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return "\n".join([s[0] for s in schemas])
def generate_basic_prompt(user_query):
    schema = get_schema()
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, nothing else."""
Test it
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=200,
    messages=[{
        "role": "user",
        "content": generate_basic_prompt("Show me all employees in the Engineering department")
    }]
)
print(response.content[0].text)
Output: SELECT * FROM employees WHERE department_id = 1;

Step 2: Improving with Few-Shot Examples

Basic prompts work, but you'll get better results by including examples. This technique, called few-shot learning, helps Claude understand the expected output format and reasoning pattern.

def generate_few_shot_prompt(user_query):
    schema = get_schema()
    examples = """
Example 1:
User: List all departments
SQL: SELECT * FROM departments;
Example 2:
User: Find employees hired after 2021 with salary above 80000
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01' AND salary > 80000;
Example 3:
User: Show average salary by department
SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
"""
    
    return f"""You are a SQL expert. Convert natural language queries to SQL.
Database Schema:
{schema}
{examples}
User: {user_query}
SQL:"""

Pro tip: Including a few sample rows of actual data in your prompt can dramatically improve accuracy. It gives Claude context about data formats, null values, and typical values.

Step 3: Chain-of-Thought Reasoning for Complex Queries

For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Use XML tags to structure the reasoning process.

def generate_cot_prompt(user_query):
    schema = get_schema()
    return f"""You are a SQL expert. Convert the user's query to SQL.
Database Schema:
{schema}
<reasoning>
Break down the user's request step by step:
What tables are needed?
What columns should be selected?
What filters or conditions apply?
Are there any aggregations or groupings?
What is the final SQL structure?
</reasoning>
User Query: {user_query}
Provide your reasoning inside <reasoning> tags, then output the final SQL query."""
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=500,
    messages=[{
        "role": "user",
        "content": generate_cot_prompt("Which department has the highest average salary and what is that average?")
    }]
)
print(response.content[0].text)

Claude will output something like:

<reasoning>
Tables needed: employees (for salary data) and departments (for department names)
Columns: department name, average salary
Need to join on department_id
Group by department, calculate AVG(salary)
Order by average salary descending, limit to 1
</reasoning>
SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name
ORDER BY avg_salary DESC
LIMIT 1;

Step 4: RAG for Large Database Schemas

Real-world databases often have hundreds of tables. Including the entire schema in every prompt wastes tokens and confuses the model. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

First, create embeddings for your schema using VoyageAI:

import voyageai
vo = voyageai.Client(api_key="your-voyage-api-key")
def embed_schema_tables():
    """Create embeddings for each table and its columns"""
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    
    table_descriptions = []
    for table in tables:
        table_name = table[0]
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        desc = f"Table: {table_name}\n"
        desc += "\n".join([f"  - {col[1]} ({col[2]})" for col in columns])
        table_descriptions.append(desc)
    
    # Create embeddings
    embeddings = vo.embed(table_descriptions, model="voyage-2")
    return list(zip(table_descriptions, embeddings.embeddings))
def retrieve_relevant_schema(query, schema_embeddings, top_k=2):
    """Retrieve the most relevant schema elements for a query"""
    query_embedding = vo.embed([query], model="voyage-2").embeddings[0]
    
    # Simple cosine similarity (use a proper vector DB in production)
    similarities = []
    for desc, emb in schema_embeddings:
        similarity = sum(a*b for a,b in zip(query_embedding, emb))
        similarities.append((similarity, desc))
    
    similarities.sort(reverse=True)
    return "\n\n".join([desc for _, desc in similarities[:top_k]])

Now integrate RAG into your prompt generation:

def generate_rag_prompt(user_query, schema_embeddings):
    relevant_schema = retrieve_relevant_schema(user_query, schema_embeddings)
    
    return f"""You are a SQL expert. Here is the relevant schema for the user's query:
{relevant_schema}
User Query: {user_query}
Generate the SQL query."""

Step 5: Self-Improvement Loop

This is where Claude truly shines. Instead of hoping the first query is correct, you can let Claude execute its SQL, check for errors, and fix them automatically.

def execute_and_improve(user_query, max_iterations=3):
    """Generate SQL, execute it, and improve if needed"""
    schema = get_schema()
    
    for i in range(max_iterations):
        # Generate SQL
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"Schema:\n{schema}\n\nQuery: {user_query}\n\nGenerate SQL:"
            }]
        )
        
        sql_query = response.content[0].text.strip()
        
        try:
            cursor.execute(sql_query)
            results = cursor.fetchall()
            
            # If we get here, SQL executed successfully
            if i > 0:
                print(f"Query improved after {i} iteration(s)")
            return sql_query, results
            
        except Exception as e:
            error_msg = str(e)
            print(f"Attempt {i+1} failed: {error_msg}")
            
            # Feed the error back to Claude for improvement
            user_query = f"""The previous SQL query failed with this error: {error_msg}
Original request: {user_query}
Please fix the SQL query."""
    
    return None, "Failed after max iterations"

This pattern is incredibly powerful for production systems. It handles:

Syntax errors (missing commas, wrong quotes)
Schema mismatches (wrong table names)
Logic errors (incorrect joins)

Evaluation: Measuring Success

To build confidence in your system, you need a systematic evaluation approach:

def evaluate_queries(test_cases):
    """Test your Text-to-SQL system against known cases"""
    results = []
    for natural_query, expected_sql in test_cases:
        generated_sql, _ = execute_and_improve(natural_query)
        
        # Compare results rather than exact SQL strings
        cursor.execute(expected_sql)
        expected_results = set(cursor.fetchall())
        
        cursor.execute(generated_sql)
        actual_results = set(cursor.fetchall())
        
        is_correct = expected_results == actual_results
        results.append({
            "query": natural_query,
            "expected": expected_sql,
            "generated": generated_sql,
            "correct": is_correct
        })
    
    accuracy = sum(1 for r in results if r["correct"]) / len(results)
    print(f"Accuracy: {accuracy:.2%}")
    return results

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt and schema, then add few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
RAG is essential for production: For databases with more than 10-15 tables, dynamic schema retrieval dramatically improves accuracy and reduces token costs.
Self-improvement loops catch errors: Let Claude execute its SQL and fix mistakes automatically. This single pattern can boost accuracy by 20-30% in real-world scenarios.
Include sample data: Adding a few rows of actual data in your prompt helps Claude understand data formats and avoid common mistakes.
Evaluate by results, not SQL strings: Two different SQL queries can produce the same correct result. Compare output data, not query syntax.