BeClaude
GuideBeginner2026-05-06

Building a Robust Text-to-SQL System with Claude: A Step-by-Step Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude

Text-to-SQL is one of the most practical applications of large language models in enterprise settings. It bridges the gap between non-technical stakeholders and complex databases, allowing anyone to ask questions in plain English and get structured data back.

Claude excels at this task because of its strong reasoning capabilities, large context window, and ability to understand nuanced natural language. In this guide, you'll learn how to build a production-ready Text-to-SQL system step by step.

Why Text-to-SQL Matters

Before diving into implementation, let's understand why this capability is so valuable:

  • Democratized Data Access: Non-technical team members can query databases without learning SQL syntax
  • Faster Prototyping: Data analysts can quickly iterate on query ideas using natural language
  • Intelligent Interfaces: Chatbots and applications can offer natural database interactions
  • Complex Query Generation: Claude can handle multi-table joins, subqueries, and aggregations that would take humans significant time to write

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3

Create an in-memory SQLite database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments (id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Marketing', 'San Francisco'), (3, 'Sales', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice Johnson', 1, 95000, '2020-03-15'), (2, 'Bob Smith', 1, 87000, '2021-06-01'), (3, 'Carol Davis', 2, 72000, '2019-11-20'), (4, 'David Wilson', 3, 65000, '2022-01-10'), (5, 'Eve Brown', 2, 78000, '2020-08-05') ])

conn.commit()

Step 1: Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt must include:

  • Clear instructions for the task
  • The user's natural language query
  • The database schema so Claude understands the table structure
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def get_schema(): """Extract schema from the database""" cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'") schemas = cursor.fetchall() return '\n'.join([s[0] for s in schemas])

def basic_text_to_sql(user_query): schema = get_schema() prompt = f"""You are a SQL expert. Convert the following natural language query into a SQL query.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, nothing else.""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Test it

query = "Show me all employees in the Engineering department" sql = basic_text_to_sql(query) print(f"Generated SQL: {sql}")

Execute and display results

cursor.execute(sql) results = cursor.fetchall() print(f"Results: {results}")

Step 2: Improving with Few-Shot Examples

Your basic prompt works, but it can be inconsistent with complex queries. Adding examples—a technique called few-shot learning—dramatically improves accuracy.

def few_shot_text_to_sql(user_query):
    schema = get_schema()
    examples = """
Examples:

User: List all departments SQL: SELECT * FROM departments;

User: Show employees hired after 2021 SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';

User: What is the average salary by department? SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name; """ prompt = f"""You are a SQL expert. Convert natural language to SQL.

Database Schema: {schema}

{examples}

User: {user_query} SQL:""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Pro Tip: Include a few rows of actual data in your prompt. This gives Claude concrete context about the data structure and content, leading to more accurate queries.

Step 3: Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step by step before generating SQL.

def chain_of_thought_text_to_sql(user_query):
    schema = get_schema()
    prompt = f"""You are a SQL expert. Convert the following query to SQL.

Database Schema: {schema}

User Query: {user_query}

Let's think through this step by step: 1) First, identify the tables needed 2) Determine the columns to select 3) Identify any JOIN conditions 4) Apply WHERE filters 5) Handle GROUP BY, ORDER BY, or aggregations 6) Write the final SQL

<reasoning> [Your step-by-step reasoning here] </reasoning>

<sql> [Your final SQL query here] </sql>""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=1000, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Using XML tags (<reasoning> and <sql>) helps Claude structure its output and makes parsing easier on your end.

Step 4: Implementing RAG for Large Database Schemas

Real-world databases can have hundreds of tables. Including the entire schema in every prompt wastes tokens and can confuse Claude. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

First, create a vector database to store table and column descriptions:

import voyageai
import numpy as np

class SchemaVectorDB: def __init__(self, api_key): self.client = voyageai.Client(api_key=api_key) self.embeddings = [] self.metadata = [] def add_table(self, table_name, columns, description): text = f"Table: {table_name}\nColumns: {', '.join(columns)}\nDescription: {description}" embedding = self.client.embed([text], model="voyage-2").embeddings[0] self.embeddings.append(embedding) self.metadata.append({"table": table_name, "columns": columns, "description": description}) def query(self, user_query, top_k=3): query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0] similarities = [np.dot(query_embedding, emb) for emb in self.embeddings] top_indices = np.argsort(similarities)[-top_k:][::-1] return [self.metadata[i] for i in top_indices]

Initialize and populate

schema_db = SchemaVectorDB(api_key="your-voyage-api-key") schema_db.add_table("employees", ["id", "name", "department_id", "salary", "hire_date"], "Employee records") schema_db.add_table("departments", ["id", "name", "location"], "Department information")

Now, use the RAG system to build dynamic prompts:

def rag_text_to_sql(user_query):
    relevant_tables = schema_db.query(user_query, top_k=2)
    
    schema_context = "Relevant Schema:\n"
    for table in relevant_tables:
        schema_context += f"Table: {table['table']}\nColumns: {', '.join(table['columns'])}\n\n"
    
    prompt = f"""You are a SQL expert. Convert the user's query to SQL using only the provided schema.

{schema_context}

User Query: {user_query}

SQL:""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Step 5: Implementing Query Self-Improvement

One of Claude's most powerful capabilities is self-correction. You can build a loop where Claude executes its generated SQL, checks for errors, and fixes them automatically.

def self_improving_text_to_sql(user_query, max_attempts=3):
    schema = get_schema()
    
    for attempt in range(max_attempts):
        # Generate SQL
        prompt = f"""Convert this query to SQL:
Schema: {schema}
Query: {user_query}
SQL:"""
        
        response = client.messages.create(
            model="claude-3-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )
        sql = response.content[0].text.strip()
        
        # Try to execute
        try:
            cursor.execute(sql)
            results = cursor.fetchall()
            return {"success": True, "sql": sql, "results": results}
        except Exception as e:
            error_msg = str(e)
            print(f"Attempt {attempt + 1} failed: {error_msg}")
            
            if attempt < max_attempts - 1:
                # Ask Claude to fix the error
                fix_prompt = f"""The SQL query you generated had an error:

Query: {sql} Error: {error_msg}

Please fix the SQL query. Schema: {schema} User Query: {user_query} Fixed SQL:""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": fix_prompt}] ) sql = response.content[0].text return {"success": False, "sql": sql, "error": "Max attempts reached"}

This self-improvement loop is invaluable in production systems where query correctness is critical.

Evaluating Your Text-to-SQL System

To ensure quality, you need a robust evaluation framework. Here's a simple approach:

  • Create a test set of 20-50 natural language queries with expected SQL outputs
  • Execute both the expected and generated SQL against your database
  • Compare results—not just the SQL strings, but the actual returned data
  • Track metrics like exact match accuracy, execution success rate, and result equivalence
def evaluate_text_to_sql(test_cases):
    correct = 0
    total = len(test_cases)
    
    for case in test_cases:
        generated_sql = basic_text_to_sql(case["query"])
        
        try:
            cursor.execute(generated_sql)
            generated_results = set(cursor.fetchall())
            
            cursor.execute(case["expected_sql"])
            expected_results = set(cursor.fetchall())
            
            if generated_results == expected_results:
                correct += 1
        except:
            pass
    
    return f"Accuracy: {correct}/{total} ({correct/total*100:.1f}%)"

Key Takeaways

  • Start simple, then layer complexity: Begin with basic prompts, then add few-shot examples, chain-of-thought reasoning, and finally RAG and self-improvement loops
  • Include schema context: Always provide Claude with the database schema—it's the single most important factor for accurate SQL generation
  • Use RAG for large schemas: When dealing with databases containing hundreds of tables, dynamically retrieve only the relevant schema information to save tokens and improve accuracy
  • Implement self-correction loops: Claude can fix its own SQL errors. Build a feedback loop that catches execution errors and asks Claude to correct them
  • Evaluate on results, not syntax: Two different SQL queries can return the same data. Compare actual query results rather than exact SQL string matching for more meaningful evaluations