GuideBeginnerBest Practices2026-05-13

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG

Learn how to build a production-ready Text-to-SQL system using Claude. Covers basic prompting, chain-of-thought, RAG for large schemas, and self-improvement loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for complex schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLPrompt EngineeringRAGClaude APIDatabase

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG

Text-to-SQL is one of the most transformative applications of large language models. It bridges the gap between natural language and structured data, allowing anyone in an organization to query databases without knowing SQL syntax. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt, then progressively improve it with few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and finally a self-improvement loop that lets Claude fix its own mistakes.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand why Text-to-SQL is so valuable:

Accessibility: Non-technical team members can query databases without learning SQL
Efficiency: Data analysts can prototype queries in seconds instead of minutes
Integration: Enables natural language interfaces for dashboards, chatbots, and internal tools
Complexity: LLMs can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    salary REAL,
    hire_date DATE,
    FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
    (1, 'Engineering', 'New York'),
    (2, 'Marketing', 'San Francisco'),
    (3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
    (1, 'Alice Johnson', 1, 95000, '2020-03-15'),
    (2, 'Bob Smith', 1, 85000, '2021-06-01'),
    (3, 'Charlie Brown', 2, 72000, '2019-11-20'),
    (4, 'Diana Prince', 3, 68000, '2022-01-10'),
    (5, 'Eve Wilson', 2, 78000, '2020-09-05')
])
conn.commit()

Step 1: Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt must include:

Clear instructions for the task
The user's natural language query
The database schema so Claude knows what tables and columns are available

def get_schema_string():
    """Extract schema from the database as a string."""
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
    schemas = cursor.fetchall()
    return '\n'.join([schema[0] for schema in schemas])
def basic_prompt(user_query: str) -> str:
    schema = get_schema_string()
    return f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
query = "Show me all employees in the Engineering department"
prompt = basic_prompt(query)
print(prompt)

Now let's call the Claude API to generate SQL:

import anthropic
client = anthropic.Anthropic()
def generate_sql(prompt: str) -> str:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text
sql = generate_sql(prompt)
print(f"Generated SQL: {sql}")
Execute the SQL to verify
cursor.execute(sql)
results = cursor.fetchall()
print(f"Results: {results}")

Step 2: Improving with Few-Shot Examples

A basic prompt works, but you'll get better results by including examples. This technique, called few-shot learning, gives Claude concrete input-output pairs to learn from.

def few_shot_prompt(user_query: str) -> str:
    schema = get_schema_string()
    examples = """
Example 1:
User Query: List all departments
SQL: SELECT * FROM departments;
Example 2:
User Query: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
Example 3:
User Query: Find employees hired after 2020
SQL: SELECT * FROM employees WHERE hire_date > '2020-12-31';
"""
    return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
{examples}
User Query: {user_query}
Generate only the SQL query:"""

Pro tip: Including a few rows of actual data in the prompt can further improve accuracy by giving Claude context about the data's structure and content.

Step 3: Chain-of-Thought Prompting

For complex queries, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating SQL. This is especially useful for queries involving multiple joins, subqueries, or aggregations.

def cot_prompt(user_query: str) -> str:
    schema = get_schema_string()
    return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
First, think step-by-step about how to answer the query:
Identify the tables needed
Determine the columns to select
Identify join conditions
Apply filters and aggregations
Order and limit if needed

Then, generate the SQL query.
User Query: {user_query}
<thinking>
[Your step-by-step reasoning]
</thinking>
<sql>
[Your SQL query]
</sql>"""

The XML tags (<thinking> and <sql>) help structure Claude's output, making it easier to parse the SQL from the reasoning.

Step 4: Implementing RAG for Complex Schemas

As databases grow to hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

First, let's build a simple vector database using VoyageAI embeddings:

import voyageai
import numpy as np
from typing import List, Dict
class SimpleVectorDB:
    def __init__(self):
        self.vo = voyageai.Client()
        self.embeddings = []
        self.metadata = []
    
    def add_documents(self, documents: List[str], metadata: List[Dict]):
        batch_embeddings = self.vo.embed(
            documents, 
            model="voyage-2"
        ).embeddings
        self.embeddings.extend(batch_embeddings)
        self.metadata.extend(metadata)
    
    def query(self, query: str, top_k: int = 5) -> List[Dict]:
        query_embedding = self.vo.embed([query], model="voyage-2").embeddings[0]
        scores = [
            np.dot(query_embedding, emb) 
            for emb in self.embeddings
        ]
        top_indices = np.argsort(scores)[-top_k:][::-1]
        return [self.metadata[i] for i in top_indices]
Index each table's schema
vectordb = SimpleVectorDB()
Get all table schemas
cursor.execute("SELECT name, sql FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
for name, schema in tables:
    vectordb.add_documents(
        documents=[schema],
        metadata=[{"table_name": name, "schema": schema}]
    )

Now update your prompt generation to use RAG:

def rag_prompt(user_query: str) -> str:
    # Retrieve relevant schema
    relevant_tables = vectordb.query(user_query, top_k=2)
    schema_context = '\n'.join([t['schema'] for t in relevant_tables])
    
    return f"""You are a SQL expert. Convert natural language queries into SQL.
Relevant Database Schema:
{schema_context}
User Query: {user_query}
Generate only the SQL query:"""

This approach scales to hundreds of tables while keeping token usage low and relevance high.

Step 5: Implementing Query Self-Improvement

One of Claude's most powerful capabilities is self-correction. We can build a loop where Claude:

Generates SQL
Executes it
Analyzes the results or errors
Improves the query if needed

def self_improving_sql(user_query: str, max_iterations: int = 3) -> str:
    prompt = cot_prompt(user_query)
    
    for i in range(max_iterations):
        sql = generate_sql(prompt)
        
        try:
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # Check if results make sense
            validation_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Results: {results}
Does this SQL correctly answer the original query? 
If yes, say 'CORRECT'. If no, explain what's wrong.
"""
            validation = client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=200,
                messages=[{"role": "user", "content": validation_prompt}]
            ).content[0].text
            
            if 'CORRECT' in validation:
                return sql
            
            # If incorrect, feed the error back
            prompt = f"""
Previous attempt was incorrect.
Original Query: {user_query}
Previous SQL: {sql}
Feedback: {validation}
Please generate a corrected SQL query:
"""
        except Exception as e:
            # Handle SQL errors
            prompt = f"""
Previous SQL had an error.
Original Query: {user_query}
Previous SQL: {sql}
Error: {str(e)}
Please generate a corrected SQL query:
"""
    
    return sql  # Return last attempt

This self-improvement loop dramatically increases reliability, especially for complex queries or edge cases.

Evaluation Framework

To ensure your Text-to-SQL system is production-ready, implement a robust evaluation framework:

def evaluate_queries(test_cases: List[Dict]):
    """
    test_cases: list of dicts with 'query' and 'expected_sql'
    """
    results = []
    for case in test_cases:
        generated_sql = self_improving_sql(case['query'])
        
        # Compare results, not just SQL text
        cursor.execute(generated_sql)
        generated_results = set(cursor.fetchall())
        
        cursor.execute(case['expected_sql'])
        expected_results = set(cursor.fetchall())
        
        is_correct = generated_results == expected_results
        results.append({
            'query': case['query'],
            'correct': is_correct,
            'generated_sql': generated_sql
        })
    
    accuracy = sum(r['correct'] for r in results) / len(results)
    return accuracy, results

Key Takeaways

Start simple, then iterate: Begin with a basic prompt and schema, then add few-shot examples, chain-of-thought, and RAG as complexity grows
RAG is essential for scale: For databases with dozens or hundreds of tables, dynamic schema retrieval keeps prompts focused and cost-effective
Self-improvement loops boost reliability: Letting Claude execute, validate, and fix its own SQL dramatically reduces errors
Always include the schema: Claude needs to know table names, columns, and relationships to generate accurate SQL
Evaluate on results, not SQL text: Two different SQL queries can produce the same correct results; compare output sets for accuracy

Text-to-SQL with Claude is a powerful capability that can democratize data access across your organization. By following the progression from basic prompts to self-improving systems, you can build a solution that handles everything from simple lookups to complex analytical queries with high reliability.