Guide2026-05-06

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG & Self-Improvement

Learn how to build a production-ready Text-to-SQL system using Claude. This guide covers basic prompting, few-shot learning, chain-of-thought, RAG for complex schemas, and self-improvement loops.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompts, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops for error correction.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG & Self-Improvement

Introduction

Text-to-SQL is a transformative natural language processing task that converts plain English queries into structured SQL statements. This capability democratizes data access, allowing non-technical team members to query databases without knowing SQL syntax, while also accelerating the workflow of experienced data analysts.

Claude excels at this task because it can understand nuanced context, interpret complex multi-part queries, and generate accurate, optimized SQL statements. In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude, starting with basic prompting and progressing to advanced techniques like Retrieval Augmented Generation (RAG) and self-improvement loops.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand the value proposition:

Accessibility: Business analysts, product managers, and executives can query databases directly without relying on engineering teams.
Efficiency: Data professionals can prototype complex queries in seconds rather than minutes or hours.
Integration: Enables natural language interfaces for chatbots, internal tools, and customer-facing applications.
Complexity Handling: LLMs can generate multi-join queries, subqueries, and aggregations that would be tedious to write manually.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Sales', 'San Francisco'),
    (3, 'Marketing', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 87000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 81000, '2022-01-10'),
    (5, 'Eve Davis', 2, 68000, '2023-04-05')
])
conn.commit()

Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt should include:

Clear instructions for the task
The user's natural language query
The database schema so Claude understands the available tables and columns

def get_schema_string(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return '\n'.join([s[0] for s in schemas])
def generate_basic_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
SQL Query:"""
Test it
schema = get_schema_string(conn)
prompt = generate_basic_prompt("Show me all employees in the Engineering department", schema)
print(prompt)

Improving with Few-Shot Examples

Basic prompts work, but you can significantly improve accuracy by including examples. This technique, called few-shot learning, gives Claude concrete input-output pairs to learn from.

def generate_few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User Query: Find all departments located in New York
SQL Query: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User Query: What is the average salary of employees?
SQL Query: SELECT AVG(salary) FROM employees;
Example 3:
User Query: List employees hired after 2021 with their department names
SQL Query: SELECT e.name, d.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE e.hire_date > '2021-01-01';
"""
    
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
Here are some examples:
{examples}
User Query: {user_query}
SQL Query:"""

Pro Tip: Consider including a few rows of actual data in your prompt alongside the schema. This gives Claude more context about data types and content patterns, leading to more accurate queries.

Using Chain-of-Thought Prompting

For complex queries involving multiple joins, aggregations, or subqueries, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Using XML tags to structure the reasoning process is particularly effective.

def generate_cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Let's think through this step-by-step:
<reasoning>
First, identify the tables needed based on the query.
Determine which columns are required.
Identify any JOIN conditions between tables.
Apply any WHERE filters.
Add any GROUP BY, ORDER BY, or aggregation functions.
</reasoning>
<sql_query>
-- Write your SQL query here
</sql_query>"""

Implementing RAG for Complex Database Schemas

As databases grow to hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the most relevant schema information.

First, let's build a simple vector database using VoyageAI embeddings:

import voyageai
import numpy as np
class SimpleVectorDB:
    def __init__(self, api_key):
        self.client = voyageai.Client(api_key=api_key)
        self.embeddings = []
        self.metadata = []
    
    def add_documents(self, documents, metadata_list):
        response = self.client.embed(documents, model="voyage-2")
        self.embeddings = response.embeddings
        self.metadata = metadata_list
    
    def search(self, query, k=3):
        query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
        similarities = [np.dot(query_embedding, emb) for emb in self.embeddings]
        top_indices = np.argsort(similarities)[-k:][::-1]
        return [self.metadata[i] for i in top_indices]

Now, let's populate the vector DB with table descriptions and use it to build a dynamic prompt:

# Prepare table descriptions
table_descriptions = [
    "employees table: Contains employee information including id, name, department_id, salary, and hire_date",
    "departments table: Contains department information including id, name, and location"
]
Initialize and populate vector DB
vectordb = SimpleVectorDB(api_key="your-voyage-api-key")
vectordb.add_documents(table_descriptions, [
    {"table": "employees", "columns": ["id", "name", "department_id", "salary", "hire_date"]},
    {"table": "departments", "columns": ["id", "name", "location"]}
])
def generate_rag_prompt(user_query, conn):
    # Retrieve relevant tables
    relevant_tables = vectordb.search(user_query, k=2)
    
    # Build dynamic schema from retrieved tables
    cursor = conn.cursor()
    schema_parts = []
    for table_info in relevant_tables:
        table_name = table_info["table"]
        cursor.execute(f"SELECT sql FROM sqlite_master WHERE type='table' AND name='{table_name}'")
        schema_parts.append(cursor.fetchone()[0])
    
    schema = '\n'.join(schema_parts)
    
    return f"""You are a SQL expert. Given the following relevant database schema, convert the user's query into SQL.
Relevant Schema:
{schema}
User Query: {user_query}
SQL Query:"""

Implementing Query Self-Improvement

One of Claude's most powerful capabilities is the ability to execute its own generated SQL, analyze results or errors, and improve the query. This creates a self-improvement loop that dramatically increases reliability.

def self_improve_query(user_query, conn, max_iterations=3):
    schema = get_schema_string(conn)
    
    for iteration in range(max_iterations):
        # Generate SQL
        prompt = generate_cot_prompt(user_query, schema)
        response = claude_api_call(prompt)  # Your Claude API call here
        sql_query = extract_sql_from_response(response)
        
        try:
            # Execute the SQL
            cursor = conn.cursor()
            cursor.execute(sql_query)
            results = cursor.fetchall()
            
            # Check if results make sense
            if len(results) == 0:
                # Ask Claude to fix the query
                fix_prompt = f"""The SQL query you generated returned no results. 
Original Query: {user_query}
Generated SQL: {sql_query}
Please analyze why this query returned no results and generate a corrected SQL query."""
                response = claude_api_call(fix_prompt)
                sql_query = extract_sql_from_response(response)
            else:
                return sql_query, results
                
        except Exception as e:
            # Handle SQL errors
            error_prompt = f"""The SQL query you generated produced an error:
Error: {str(e)}
Original Query: {user_query}
Generated SQL: {sql_query}
Please fix the SQL query to resolve this error."""
            response = claude_api_call(error_prompt)
            sql_query = extract_sql_from_response(response)
    
    return sql_query, None

This self-improvement loop handles:

Syntax errors: Claude can fix malformed SQL
Logical errors: If results don't match expectations, Claude can refine the query
Schema mismatches: Claude can correct column or table name errors

Evaluating Your Text-to-SQL System

To ensure your system is reliable, implement a robust evaluation framework:

Accuracy: Compare generated SQL against ground truth queries
Execution success rate: Percentage of queries that execute without errors
Result correctness: Verify that query results match expected outputs
Performance: Measure query generation time and execution time

def evaluate_system(test_cases, conn):
    results = []
    for test_case in test_cases:
        user_query = test_case["query"]
        expected_sql = test_case["expected_sql"]
        
        generated_sql, _ = self_improve_query(user_query, conn)
        
        # Compare generated SQL with expected
        is_correct = normalize_sql(generated_sql) == normalize_sql(expected_sql)
        results.append({
            "query": user_query,
            "generated": generated_sql,
            "expected": expected_sql,
            "correct": is_correct
        })
    
    accuracy = sum(r["correct"] for r in results) / len(results)
    return accuracy, results

Key Takeaways

Start simple, iterate fast: Begin with a basic prompt and progressively add few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
RAG is essential for production systems: For databases with dozens or hundreds of tables, dynamic schema retrieval keeps prompts focused and cost-effective.
Self-improvement loops dramatically increase reliability: Let Claude execute, analyze, and fix its own SQL queries to handle errors and edge cases automatically.
Include data samples in prompts: Adding a few rows of actual data alongside schema information helps Claude understand data patterns and generate more accurate queries.
Evaluate rigorously: Build a test suite with ground truth queries to measure accuracy, execution success rate, and result correctness before deploying to production.