Guide2026-05-02

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to Self-Improving Queries

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompt engineering, RAG for large schemas, chain-of-thought reasoning, and self-improving query loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.

Text-to-SQLClaude APIPrompt EngineeringRAGDatabase

Introduction

Text-to-SQL is a transformative natural language processing task that converts plain English queries into structured SQL statements. This capability democratizes data access, allowing non-technical team members to query databases without knowing SQL syntax, while also helping experienced analysts prototype complex queries faster.

Claude excels at this task because it can understand context, interpret nuanced requests, and generate accurate SQL—even for queries involving multiple joins, subqueries, and aggregations. In this guide, you'll learn how to build a robust Text-to-SQL system using Claude, starting from a basic prompt and progressing to advanced techniques like Retrieval Augmented Generation (RAG) and self-improving query loops.

Why Text-to-SQL Matters

Accessibility: Non-technical users can extract insights from databases without writing SQL.
Efficiency: Data analysts can prototype queries in seconds using natural language.
Integration: Enables intuitive database interfaces in chatbots and applications.
Complexity: LLMs can generate SQL with multiple joins, subqueries, and aggregations that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Sales', 'San Francisco'),
    (3, 'Marketing', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 2, 75000, '2019-06-01'),
    (3, 'Charlie Brown', 1, 110000, '2018-01-10'),
    (4, 'Diana Prince', 3, 82000, '2021-09-20')
])
conn.commit()

Creating a Basic Text-to-SQL Prompt

A good Text-to-SQL prompt needs three things:

Clear instructions for Claude
The user's natural language query
The database schema so Claude understands the structure

def generate_basic_prompt(user_query: str, schema: str) -> str:
    return f"""You are a SQL expert. Convert the following natural language query into a valid SQL query for the given database schema.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Get schema
schema = """
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
);
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
);
"""
prompt = generate_basic_prompt("Show me all employees in the Engineering department", schema)
print(prompt)

When you send this to Claude via the API, it will return a SQL query like:

SELECT e.* FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Engineering';

Improving with Few-Shot Examples

Your basic prompt works, but you can improve accuracy by including examples—a technique called few-shot learning. This helps Claude understand the expected output format and reasoning pattern.

def generate_few_shot_prompt(user_query: str, schema: str) -> str:
    examples = """
Example 1:
User Query: "List all departments in Chicago"
SQL: SELECT * FROM departments WHERE location = 'Chicago';
Example 2:
User Query: "What is the average salary by department?"
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM departments d
JOIN employees e ON d.id = e.department_id
GROUP BY d.name;
"""
    return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
{examples}
User Query: {user_query}
SQL:"""

Pro tip: Including a few rows of actual data in the prompt can give Claude additional context about data types and values, leading to more accurate queries.

Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple conditions, aggregations, or subqueries, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating SQL. Using XML tags makes the reasoning process explicit.

def generate_cot_prompt(user_query: str, schema: str) -> str:
    return f"""You are a SQL expert. Convert the following natural language query into SQL.
Database Schema:
{schema}
User Query: {user_query}
<reasoning>
Break down the query step by step:
Identify the tables needed
Determine the columns to select
Identify any filters (WHERE clauses)
Determine if joins, aggregations, or groupings are needed
Construct the final SQL
</reasoning>
<sql>
Generate only the SQL query here.
</sql>"""

This approach is especially useful when your users ask questions like "Show me the top 3 highest-paid employees in each department along with their department names." The step-by-step reasoning helps Claude handle the complexity correctly.

Implementing RAG for Large Database Schemas

As your database grows to dozens or hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

Here's how to implement RAG using VoyageAI embeddings:

import voyageai
from typing import List, Dict
class SchemaVectorDB:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.table_descriptions = []
    
    def add_tables(self, tables: List[Dict[str, str]]):
        """
        tables: list of dicts with 'name', 'schema', and 'description' keys
        """
        for table in tables:
            description = f"{table['name']}: {table['description']}\nSchema: {table['schema']}"
            self.table_descriptions.append(description)
        
        # Generate embeddings
        response = self.client.embed(
            [t['description'] for t in tables],
            model="voyage-2"
        )
        self.embeddings = response.embeddings
    
    def query_relevant_schema(self, user_query: str, top_k: int = 3) -> str:
        # Embed the user query
        query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0]
        
        # Compute cosine similarity
        import numpy as np
        similarities = [
            np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            for emb in self.embeddings
        ]
        
        # Get top-k most relevant tables
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        relevant_schemas = [self.table_descriptions[i] for i in top_indices]
        
        return "\n\n".join(relevant_schemas)
Usage example
vector_db = SchemaVectorDB(api_key="your-voyage-api-key")
vector_db.add_tables([
    {
        "name": "employees",
        "schema": "id, name, department_id, salary, hire_date",
        "description": "Stores employee information including salary and department assignment"
    },
    {
        "name": "departments",
        "schema": "id, name, location",
        "description": "Company departments and their locations"
    }
])
Get relevant schema for a query
relevant_schema = vector_db.query_relevant_schema("Show me all employees in Engineering")
prompt = generate_basic_prompt("Show me all employees in Engineering", relevant_schema)

Building a Self-Improving Query Loop

One of the most powerful techniques is giving Claude the ability to execute its own SQL, analyze results or errors, and improve the query. This creates a self-correcting loop.

def self_improving_query(user_query: str, schema: str, conn, max_iterations: int = 3):
    prompt = generate_cot_prompt(user_query, schema)
    
    for iteration in range(max_iterations):
        # Get SQL from Claude
        sql = get_claude_response(prompt)  # Your API call here
        
        try:
            cursor = conn.cursor()
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # If successful, return results
            return {
                "success": True,
                "sql": sql,
                "results": results,
                "iterations": iteration + 1
            }
        except Exception as e:
            error_msg = str(e)
            # Feed the error back to Claude for improvement
            prompt = f"""You previously generated this SQL:
{sql}
It produced this error:
{error_msg}
Original query: {user_query}
Database schema:
{schema}
Please fix the SQL query."""
    
    return {
        "success": False,
        "error": "Max iterations reached without successful query"
    }

This loop is invaluable for production systems where database schemas evolve, edge cases appear, or users ask ambiguous questions. Claude can diagnose syntax errors, missing joins, or incorrect column names and fix them autonomously.

Evaluating Your Text-to-SQL System

To ensure your system is production-ready, build an evaluation pipeline:

Create a test set of 20-50 query pairs (natural language + expected SQL)
Measure exact match accuracy (SQL string equality)
Measure execution accuracy (does the generated SQL return the same results as the expected SQL?)
Track error rates across different query types (simple SELECTs, JOINs, aggregations, subqueries)

def evaluate_system(test_cases: List[Dict], conn):
    results = {
        "exact_match": 0,
        "execution_match": 0,
        "total": len(test_cases)
    }
    
    for case in test_cases:
        generated_sql = get_claude_response(generate_cot_prompt(case["query"], schema))
        
        # Check exact match
        if generated_sql.strip() == case["expected_sql"].strip():
            results["exact_match"] += 1
        
        # Check execution match
        try:
            cursor = conn.cursor()
            cursor.execute(generated_sql)
            generated_results = cursor.fetchall()
            
            cursor.execute(case["expected_sql"])
            expected_results = cursor.fetchall()
            
            if generated_results == expected_results:
                results["execution_match"] += 1
        except:
            pass
    
    return results

Key Takeaways

Start simple, then layer complexity: Begin with a basic prompt including schema and instructions, then add few-shot examples, chain-of-thought reasoning, and finally RAG as your schema grows.
RAG is essential for production databases: When you have dozens or hundreds of tables, dynamically retrieving only relevant schema information keeps prompts efficient and accurate.
Self-improvement loops reduce errors: Letting Claude execute its own SQL, catch errors, and fix them creates a robust system that handles edge cases gracefully.
Include sample data when possible: Adding a few rows of real data to your prompt helps Claude understand data types and values, leading to more accurate queries.
Evaluate systematically: Build a test set of query pairs and measure both exact match and execution accuracy to track improvements over time.