GuideBeginnerBest Practices2026-05-16

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLPrompt EngineeringRAGDatabaseClaude API

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Introduction

Text-to-SQL is one of the most impactful applications of large language models in the enterprise. It bridges the gap between non-technical stakeholders and the data they need, allowing anyone to ask questions in plain English and get structured data back.

Claude excels at this task because of its strong reasoning capabilities, large context window, and ability to follow complex instructions. This guide will take you from a basic implementation to a production-ready system that can handle complex database schemas, recover from errors, and generate accurate SQL consistently.

Why Text-to-SQL Matters

Before diving into the code, let's understand why this capability is so valuable:

Democratizes data access: Business analysts, product managers, and executives can query databases without learning SQL syntax.
Accelerates prototyping: Data scientists can quickly iterate on queries without writing every join and subquery manually.
Enables conversational interfaces: Chatbots and internal tools can answer data questions in real-time.
Handles complexity: Claude can generate multi-table joins, aggregations, window functions, and subqueries that would take humans significant time to write correctly.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10'),
    (5, 'Eve Williams', 2, 65000, '2023-04-05')
])
conn.commit()

Step 1: Creating a Basic Text-to-SQL Prompt

A good prompt needs three things: clear instructions, the user's query, and the database schema. Let's start simple.

def get_schema(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return '\n'.join([s[0] for s in schemas])
def basic_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, nothing else."""
Test it
schema = get_schema(conn)
prompt = basic_prompt("Show me all employees in Engineering", schema)
print(prompt)

When you send this to Claude via the API, you'll get back a SQL query like:

SELECT * FROM employees WHERE department_id = 1;

This works, but it has limitations. The model might guess column names incorrectly or miss the join needed for department names.

Step 2: Improving with Few-Shot Examples

Adding examples to your prompt (few-shot learning) dramatically improves accuracy. The model learns the pattern of input-output pairs.

def few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User: List all departments
SQL: SELECT * FROM departments;
Example 2:
User: Find employees hired after 2021
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';
Example 3:
User: Show average salary by department
SQL: SELECT d.name, AVG(e.salary) as avg_salary 
FROM employees e 
JOIN departments d ON e.department_id = d.id 
GROUP BY d.name;
"""
    
    return f"""You are a SQL expert. Given the database schema and examples, convert the user's query to SQL.
Database Schema:
{schema}
{examples}
User: {user_query}
SQL:"""

Pro tip: Include a few rows of sample data in your prompt. This gives Claude context about actual values, which helps with filtering conditions and date formats.

Step 3: Chain-of-Thought Prompting for Complex Queries

For multi-step queries, chain-of-thought (CoT) prompting helps Claude reason through the problem before generating SQL. Use XML tags to structure the reasoning process.

def cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the user's query to SQL by thinking step by step.
Database Schema:
{schema}
User Query: {user_query}
<reasoning>
Identify the tables needed
Determine the columns to select
Identify join conditions
Apply filters and aggregations
Order and limit if needed
</reasoning>
<sql>
-- Your SQL query here
</sql>"""

This structured approach forces Claude to decompose the problem, leading to more accurate results for queries like "Show me the top 3 departments by average salary, but only include departments with more than 2 employees."

Step 4: RAG for Complex Database Schemas

Real-world databases often have hundreds of tables. You can't fit the entire schema in every prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

First, create embeddings of your schema using VoyageAI or another embedding provider:

import voyageai
from typing import List, Dict
import numpy as np
class SchemaVectorDB:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.metadata = []
    
    def add_table(self, table_name: str, schema_text: str):
        embedding = self.client.embed([schema_text], model="voyage-2").embeddings[0]
        self.embeddings.append(embedding)
        self.metadata.append({"table": table_name, "schema": schema_text})
    
    def query(self, user_query: str, top_k: int = 3) -> List[Dict]:
        query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0]
        similarities = [
            np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            for emb in self.embeddings
        ]
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        return [self.metadata[i] for i in top_indices]

Now, when a user asks a question, you retrieve only the relevant tables and inject them into the prompt:

def rag_prompt(user_query, vector_db):
    relevant_schemas = vector_db.query(user_query)
    schema_context = "\n\n".join([s["schema"] for s in relevant_schemas])
    
    return f"""You are a SQL expert. Here are the relevant database tables for the user's query:
{schema_context}
User Query: {user_query}
Generate the SQL query."""

This approach scales to databases with hundreds of tables and keeps your prompts lean and focused.

Step 5: Self-Improvement Loop

One of Claude's most powerful capabilities is self-correction. You can build a loop where Claude:

Generates SQL
Executes it against the database
Analyzes the results or errors
Refines the query if needed

def self_improving_query(user_query, schema, conn, max_iterations=3):
    cursor = conn.cursor()
    
    for i in range(max_iterations):
        # Generate SQL
        prompt = cot_prompt(user_query, schema)
        sql = get_claude_response(prompt)  # Your API call here
        
        try:
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # Check if results make sense
            validation_prompt = f"""
User asked: {user_query}
Generated SQL: {sql}
Results: {results}
Does this SQL correctly answer the user's question? If not, explain what's wrong and provide the corrected SQL.
"""
            validation = get_claude_response(validation_prompt)
            
            if "correct" in validation.lower():
                return sql, results
            else:
                # Extract corrected SQL from validation
                user_query = f"Fix this: {validation}"
                
        except Exception as e:
            # SQL error - ask Claude to fix it
            error_prompt = f"""
The following SQL query produced an error:
{sql}
Error: {str(e)}
Please provide the corrected SQL query."""
            user_query = error_prompt
    
    return None, None

This loop is particularly valuable for:

Syntax errors: Claude can fix typos, missing commas, or incorrect keywords
Logic errors: If results don't match expectations, Claude can adjust joins or filters
Edge cases: Handling NULL values, date formatting, or type mismatches

Evaluations: Measuring Success

To build confidence in your system, you need a robust evaluation framework. Create a test set of query-SQL pairs and measure:

Exact match accuracy: Does the generated SQL match the expected output?
Execution accuracy: Does the SQL execute without errors?
Result accuracy: Does the executed query return the correct data?

def evaluate_system(test_cases, conn, schema):
    results = []
    for user_query, expected_sql in test_cases:
        generated_sql = generate_sql(user_query, schema)
        
        # Check execution
        try:
            cursor = conn.cursor()
            cursor.execute(generated_sql)
            generated_results = cursor.fetchall()
            
            cursor.execute(expected_sql)
            expected_results = cursor.fetchall()
            
            results.append({
                "query": user_query,
                "exact_match": generated_sql.strip() == expected_sql.strip(),
                "result_match": generated_results == expected_results,
                "error": None
            })
        except Exception as e:
            results.append({
                "query": user_query,
                "exact_match": False,
                "result_match": False,
                "error": str(e)
            })
    
    return results

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
RAG is essential for production systems: For databases with more than 10-15 tables, dynamic schema retrieval is necessary to stay within context limits and maintain accuracy.
Self-improvement loops dramatically boost reliability: Claude's ability to analyze its own outputs and fix errors makes your system more robust than a single-pass approach.
Include sample data in prompts: A few rows of real data help Claude understand value formats, date styles, and naming conventions.
Always evaluate: Build a test suite of query-SQL pairs to measure accuracy and catch regressions when you change prompts or models.

Text-to-SQL with Claude is not just about generating correct SQL—it's about building a system that gracefully handles ambiguity, recovers from errors, and scales to real-world database complexity. Start with the patterns in this guide, and you'll have a production-ready solution in no time.