Guide2026-05-05

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for large schemas, chain-of-thought, and self-improvement loops.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLClaude APIPrompt EngineeringRAGDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most impactful applications of large language models in enterprise settings. It bridges the gap between non-technical stakeholders and complex databases, allowing anyone to ask questions in plain English and get structured data back. Claude excels at this task because of its strong reasoning capabilities, large context window, and ability to follow complex instructions.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting, then layer in advanced techniques like few-shot learning, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Before diving into implementation, it's worth understanding why this capability is so valuable:

Democratizes data access: Non-technical team members can query databases without knowing SQL syntax.
Accelerates analysis: Data analysts can prototype queries in seconds rather than minutes.
Enables natural interfaces: Chatbots and internal tools can offer conversational database interactions.
Handles complexity: Claude can generate queries with multiple JOINs, subqueries, and aggregations that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
    CREATE TABLE departments (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        location TEXT
    )
''')
Create employees table
cursor.execute('''
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY,
        name TEXT NOT NULL,
        department_id INTEGER,
        salary REAL,
        hire_date DATE,
        FOREIGN KEY (department_id) REFERENCES departments(id)
    )
''')
Insert sample data
departments = [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
]
employees = [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 88000, '2022-01-10'),
    (5, 'Eve Davis', 2, 65000, '2023-04-05')
]
cursor.executemany('INSERT INTO departments VALUES (?,?,?)', departments)
cursor.executemany('INSERT INTO employees VALUES (?,?,?,?,?)', employees)
conn.commit()

Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt should include:

Clear instructions telling Claude what to do
The user's natural language query
The database schema so Claude understands available tables and columns

Here's a basic implementation:

def get_schema(cursor):
    """Extract schema from SQLite database"""
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return '\n'.join([s[0] for s in schemas if s[0]])
def generate_basic_prompt(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the following natural language query into a SQL query for the given database schema.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, without any explanation."""
    return prompt
Example usage
schema = get_schema(cursor)
prompt = generate_basic_prompt("Show me all employees in Engineering", schema)
print(prompt)

Improving with Few-Shot Examples

Basic prompts work, but you'll get better results by including examples. This technique, called few-shot learning, helps Claude understand the expected output format and reasoning pattern.

def generate_few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User Query: List all departments located in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User Query: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary 
FROM employees e 
JOIN departments d ON e.department_id = d.id 
GROUP BY d.name;
Example 3:
User Query: Find employees hired after 2021 with salary above 80000
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01' AND salary > 80000;
"""
    
    prompt = f"""You are a SQL expert. Convert natural language queries to SQL.
{schema}
Here are some examples:
{examples}
Now convert this query:
User Query: {user_query}
SQL:"""
    return prompt

Pro tip: Including a few rows of actual data alongside the schema can further improve accuracy by giving Claude context about data values and formats.

Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, aggregations, or subqueries, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Using XML tags to structure the reasoning process is particularly effective with Claude.

def generate_cot_prompt(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the following query to SQL using step-by-step reasoning.
Database Schema:
{schema}
User Query: {user_query}
<reasoning>
Break down the query into steps:
Identify the tables needed
Determine the columns to select
Identify join conditions
Apply filters and aggregations
Consider ordering and limits
</reasoning>
<sql>
Generate only the final SQL query here
</sql>"""
    return prompt

This structured approach significantly improves accuracy on complex queries because Claude explicitly reasons about the database structure before generating SQL.

Implementing RAG for Large Database Schemas

Real-world databases often have hundreds of tables. Including the entire schema in every prompt is impractical and wastes tokens. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

Here's how to implement RAG using VoyageAI embeddings:

import voyageai
from typing import List, Dict
class SchemaRetriever:
    def __init__(self, api_key: str):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.schema_chunks = []
    
    def index_schema(self, tables: List[Dict[str, str]]):
        """Index table schemas for retrieval"""
        for table in tables:
            chunk = f"Table: {table['name']}\nColumns: {', '.join(table['columns'])}"
            self.schema_chunks.append(chunk)
        
        # Generate embeddings
        response = self.client.embed(
            self.schema_chunks, 
            model="voyage-2"
        )
        self.embeddings = response.embeddings
    
    def retrieve_relevant_schema(self, query: str, top_k: int = 3) -> str:
        """Retrieve most relevant schema chunks for a query"""
        query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
        
        # Compute cosine similarity
        similarities = [
            self._cosine_similarity(query_embedding, emb) 
            for emb in self.embeddings
        ]
        
        # Get top-k indices
        top_indices = sorted(
            range(len(similarities)), 
            key=lambda i: similarities[i], 
            reverse=True
        )[:top_k]
        
        return '\n'.join([self.schema_chunks[i] for i in top_indices])
    
    def _cosine_similarity(self, a, b):
        dot_product = sum(x*y for x, y in zip(a, b))
        norm_a = sum(xx for x in a) * 0.5
        norm_b = sum(xx for x in b) * 0.5
        return dot_product / (norm_a * norm_b)

With this retriever, your prompt only includes the 2-3 most relevant tables, saving tokens and reducing noise.

Building a Self-Improvement Loop

One of Claude's most powerful capabilities is the ability to analyze and fix its own mistakes. By implementing a self-improvement loop, you can dramatically increase reliability:

def text_to_sql_with_self_improvement(user_query, schema, cursor, max_attempts=3):
    """Generate SQL, execute it, and improve if errors occur"""
    
    for attempt in range(max_attempts):
        # Generate SQL
        prompt = generate_cot_prompt(user_query, schema)
        response = claude_api.complete(prompt)
        sql_query = extract_sql_from_response(response)
        
        try:
            # Execute the query
            cursor.execute(sql_query)
            results = cursor.fetchall()
            
            # Check if results make sense
            validation_prompt = f"""
            User asked: {user_query}
            Generated SQL: {sql_query}
            Results: {results[:5]}
            
            Does this SQL correctly answer the user's question? 
            If not, explain what's wrong and provide a corrected SQL query.
            """
            
            validation = claude_api.complete(validation_prompt)
            
            if "CORRECT" in validation.upper():
                return sql_query, results
            else:
                # Use feedback to improve
                user_query = f"{user_query}\nPrevious attempt had issues: {validation}"
                
        except Exception as e:
            # SQL execution error - feed back to Claude
            error_prompt = f"""
            The following SQL query produced an error:
            Query: {sql_query}
            Error: {str(e)}
            
            Please fix the SQL query.
            """
            user_query = f"{user_query}\n{error_prompt}"
    
    return None, None  # Failed after max attempts

This loop handles two common failure modes:

SQL syntax errors: Claude sees the error message and fixes the query
Logical errors: Claude validates that results actually answer the user's question

Evaluating Your System

To ensure quality, build an evaluation pipeline:

def evaluate_text_to_sql(test_cases, cursor):
    """Evaluate system against test cases"""
    results = []
    
    for test in test_cases:
        query, expected_sql = test['query'], test['expected_sql']
        
        generated_sql, _ = text_to_sql_with_self_improvement(
            query, get_schema(cursor), cursor
        )
        
        # Compare results
        cursor.execute(expected_sql)
        expected_results = set(cursor.fetchall())
        
        cursor.execute(generated_sql)
        actual_results = set(cursor.fetchall())
        
        results.append({
            'query': query,
            'exact_match': expected_results == actual_results,
            'expected': expected_results,
            'actual': actual_results
        })
    
    accuracy = sum(r['exact_match'] for r in results) / len(results)
    return accuracy, results

Key Takeaways

Start simple, then layer complexity: Begin with basic prompting, then add few-shot examples, chain-of-thought, RAG, and self-improvement as needed.
RAG is essential for real-world databases: When your schema has dozens or hundreds of tables, dynamic retrieval of relevant schema information is critical for both accuracy and cost efficiency.
Self-improvement loops dramatically increase reliability: Claude can analyze its own SQL output, catch errors, and fix them—treat this as a core feature, not an afterthought.
Always validate with real data: Test your generated SQL against actual database results, not just syntactic correctness. A query can be valid SQL but answer the wrong question.
Include data samples in prompts: Adding a few rows of real data alongside the schema helps Claude understand data formats and values, leading to more accurate queries.