BeClaude
GuideBeginnerBest Practices2026-05-14

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide walks through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops. You'll learn practical techniques to convert natural language to accurate SQL queries.

Text-to-SQLPrompt EngineeringRAGClaude APIDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most impactful applications of large language models. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct code.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that catches and fixes errors automatically.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand why this capability is so valuable:

  • Democratizing Data Access: Non-technical stakeholders can query databases without waiting for engineers or learning SQL syntax.
  • Rapid Prototyping: Data analysts can quickly iterate on query ideas using natural language before refining the SQL.
  • Chatbot Integration: Enable conversational database interactions in customer support, internal tools, or analytics dashboards.
  • Complex Query Generation: LLMs can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write correctly.

Prerequisites

To follow along, you'll need:

  • A Claude API key (from Anthropic)
  • Python 3.8+ installed
  • Basic familiarity with SQL and Python
  • An SQLite database (or any database you want to query)

Setting Up Your Environment

First, install the required packages:

pip install anthropic sqlite3 voyageai

Now, let's create a test database with two tables: employees and departments. This will be our playground throughout the guide.

import sqlite3

Create an in-memory SQLite database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments(id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Marketing', 'San Francisco'), (3, 'Sales', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice Johnson', 1, 95000, '2020-03-15'), (2, 'Bob Smith', 1, 85000, '2021-06-01'), (3, 'Charlie Brown', 2, 72000, '2019-11-20'), (4, 'Diana Prince', 3, 88000, '2022-01-10'), (5, 'Eve Davis', 2, 65000, '2023-04-05') ])

conn.commit()

Step 1: The Basic Text-to-SQL Prompt

A good Text-to-SQL prompt needs three things: clear instructions, the database schema, and the user's query. Here's a minimal implementation:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def get_schema(cursor): """Extract schema information from the database.""" cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'") schemas = cursor.fetchall() return "\n".join([s[0] for s in schemas if s[0]])

def basic_prompt(user_query, schema): return f"""You are a SQL expert. Convert the following natural language query into a SQL query.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, no explanation."""

Test it

schema = get_schema(cursor) user_query = "Show me all employees in the Engineering department"

response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=200, messages=[{"role": "user", "content": basic_prompt(user_query, schema)}] )

print(response.content[0].text)

This works, but it's brittle. Let's improve it.

Step 2: Improving with Few-Shot Examples

Few-shot learning dramatically improves accuracy by showing Claude examples of input-output pairs. This helps the model understand the expected format and reasoning pattern.

def few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User Query: "List all departments in Chicago"
SQL: SELECT * FROM departments WHERE location = 'Chicago';

Example 2: User Query: "What is the average salary by department?" SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;

Example 3: User Query: "Who was hired most recently?" SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1; """ return f"""You are a SQL expert. Convert natural language queries into SQL.

Database Schema: {schema}

Here are some examples: {examples}

User Query: {user_query} SQL:"""

Pro Tip: Include a few rows of actual data in your prompt. This gives Claude context about data types and values, leading to more accurate queries.

Step 3: Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Use XML tags to structure the reasoning process:

def cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the following natural language query into a SQL query.

Database Schema: {schema}

User Query: {user_query}

Let's think through this step by step:

<thinking>

  • Identify the tables needed
  • Identify the columns needed
  • Determine joins and conditions
  • Consider aggregations or ordering
  • Write the final SQL
</thinking>

SQL:"""

This approach is particularly effective when you need Claude to handle ambiguous queries or complex business logic.

Step 4: RAG for Large Database Schemas

When your database has dozens or hundreds of tables, including the entire schema in every prompt is impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

First, create embeddings for your schema using VoyageAI:

import voyageai

vo = voyageai.Client(api_key="your-voyage-api-key")

def create_schema_embeddings(cursor): """Create embeddings for each table in the database.""" cursor.execute("SELECT name FROM sqlite_master WHERE type='table'") tables = cursor.fetchall() schema_docs = [] for table in tables: cursor.execute(f"PRAGMA table_info({table[0]})") columns = cursor.fetchall() col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns]) schema_docs.append(f"Table: {table[0]}, Columns: {col_desc}") # Get embeddings embeddings = vo.embed(schema_docs, model="voyage-2").embeddings return dict(zip([t[0] for t in tables], embeddings))

Now, when a user asks a question, retrieve the most relevant tables:

def retrieve_relevant_schema(user_query, schema_embeddings, cursor):
    query_embedding = vo.embed([user_query], model="voyage-2").embeddings[0]
    
    # Simple cosine similarity (use a proper vector DB in production)
    similarities = {}
    for table_name, table_embedding in schema_embeddings.items():
        similarity = sum(a*b for a,b in zip(query_embedding, table_embedding))
        similarities[table_name] = similarity
    
    # Get top 2 most relevant tables
    relevant_tables = sorted(similarities, key=similarities.get, reverse=True)[:2]
    
    # Fetch their schemas
    schemas = []
    for table in relevant_tables:
        cursor.execute(f"SELECT sql FROM sqlite_master WHERE name='{table}'")
        schemas.append(cursor.fetchone()[0])
    
    return "\n".join(schemas)

This approach scales to databases with hundreds of tables while keeping token usage low.

Step 5: Self-Improvement Loop

The most powerful technique is giving Claude the ability to execute its own SQL, analyze results, and fix errors. This creates a feedback loop that dramatically improves reliability.

def self_improving_query(user_query, cursor, max_iterations=3):
    schema = get_schema(cursor)
    
    for i in range(max_iterations):
        # Generate SQL
        response = client.messages.create(
            model="claude-3-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": cot_prompt(user_query, schema)}]
        )
        
        sql = response.content[0].text.strip()
        
        # Try to execute
        try:
            cursor.execute(sql)
            results = cursor.fetchall()
            
            # Ask Claude to verify the results
            verification_prompt = f"""The following SQL was generated for the query: "{user_query}"

SQL: {sql}

Results: {results}

Are these results correct? If not, explain what's wrong and provide the corrected SQL.""" verify_response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": verification_prompt}] ) # If Claude says it's correct, return if "correct" in verify_response.content[0].text.lower(): return sql, results # Otherwise, use the corrected SQL in the next iteration sql = verify_response.content[0].text except Exception as e: # If there's an error, ask Claude to fix it error_prompt = f"""The SQL query: {sql}

Produced this error: {str(e)}

Please fix the SQL query.""" fix_response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": error_prompt}] ) sql = fix_response.content[0].text return sql, "Max iterations reached"

This self-improvement loop handles:

  • Syntax errors (missing commas, wrong keywords)
  • Logic errors (wrong joins, incorrect aggregations)
  • Schema mismatches (referencing non-existent columns)

Evaluating Your Text-to-SQL System

To ensure quality, build an evaluation pipeline:

  • Create a test set of 50-100 natural language queries with expected SQL
  • Measure exact match (query matches expected SQL exactly)
  • Measure execution match (query produces same results as expected SQL)
  • Track error rate (percentage of queries that fail to execute)
def evaluate(test_cases, cursor):
    exact_matches = 0
    execution_matches = 0
    errors = 0
    
    for nl_query, expected_sql, expected_results in test_cases:
        try:
            generated_sql, results = self_improving_query(nl_query, cursor)
            
            if generated_sql.strip() == expected_sql.strip():
                exact_matches += 1
            
            if results == expected_results:
                execution_matches += 1
                
        except Exception:
            errors += 1
    
    total = len(test_cases)
    print(f"Exact Match: {exact_matches/total:.1%}")
    print(f"Execution Match: {execution_matches/total:.1%}")
    print(f"Error Rate: {errors/total:.1%}")

Best Practices for Production

  • Always validate SQL before executing against production databases. Use a read-only connection or wrap in a transaction that rolls back.
  • Set strict token limits to prevent runaway queries.
  • Log all queries and results for debugging and improvement.
  • Use temperature 0 for deterministic SQL generation.
  • Consider adding a "human in the loop" for destructive operations (DELETE, UPDATE, DROP).

Key Takeaways

  • Start simple, iterate fast: Begin with a basic prompt and add complexity (few-shot, CoT, RAG) as needed based on your specific use case.
  • RAG is essential for real-world databases: When your schema exceeds a few tables, dynamic retrieval keeps prompts efficient and accurate.
  • Self-improvement loops dramatically boost reliability: Let Claude execute, verify, and fix its own SQL. This catches most errors automatically.
  • Always evaluate: Build a test set and measure exact match, execution match, and error rates to track improvements over time.
  • Safety first: Never execute generated SQL directly on production databases without validation and proper access controls.