BeClaude
Guide2026-05-06

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops for accurate SQL generation.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase Integration

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Imagine giving your non-technical team members the ability to query complex databases just by asking questions in plain English. That's the promise of Text-to-SQL, and Claude is exceptionally good at it. This guide walks you through building a production-ready Text-to-SQL system, from basic prompts to advanced self-improvement loops.

Why Text-to-SQL Matters

Text-to-SQL converts natural language questions into structured SQL queries. It's a game-changer for organizations because:

  • Democratizes data access: Business analysts, product managers, and executives can query databases without knowing SQL syntax
  • Boosts productivity: Data analysts can prototype complex queries in seconds instead of minutes
  • Enables intuitive interfaces: Power chatbots and internal tools with natural language database interaction
  • Handles complexity: Claude can generate queries with multiple JOINs, subqueries, and aggregations that would take humans significant time to write correctly

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3

Create an in-memory SQLite database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments(id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Marketing', 'San Francisco'), (3, 'Sales', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice Johnson', 1, 95000, '2020-03-15'), (2, 'Bob Smith', 1, 85000, '2021-06-01'), (3, 'Charlie Brown', 2, 72000, '2019-11-20'), (4, 'Diana Prince', 3, 88000, '2022-01-10') ])

conn.commit()

Step 1: The Basic Text-to-SQL Prompt

A good Text-to-SQL prompt needs three things: clear instructions, the database schema, and the user's question. Here's a minimal but effective approach:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def generate_sql(user_query, schema): prompt = f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL statement.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, no explanation.""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Example usage

schema = """ Table: departments Columns: id (INTEGER), name (TEXT), location (TEXT)

Table: employees Columns: id (INTEGER), name (TEXT), department_id (INTEGER), salary (REAL), hire_date (DATE) Foreign Key: department_id -> departments.id """

query = "Show me all employees in the Engineering department" sql = generate_sql(query, schema) print(sql)

Output: SELECT * FROM employees WHERE department_id = 1;

Step 2: Improving with Few-Shot Examples

Basic prompts work, but adding examples (few-shot learning) dramatically improves accuracy, especially for complex queries. Let's enhance our function:

def generate_sql_with_examples(user_query, schema):
    examples = """
Examples:
User: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';

User: What is the average salary by department? SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;

User: Who was hired most recently? SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1; """ prompt = f"""You are a SQL expert. Given the schema and examples below, convert the user's query to SQL.

{schema} {examples}

User Query: {user_query} SQL:""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data formats and values, leading to more accurate queries.

Step 3: Chain-of-Thought for Complex Queries

For multi-step queries, chain-of-thought prompting helps Claude reason through the problem. Use XML tags to structure the reasoning:

def generate_sql_cot(user_query, schema):
    prompt = f"""You are a SQL expert. Convert the user's query to SQL using step-by-step reasoning.

{schema}

User Query: {user_query}

<reasoning>

  • Identify the tables needed
  • Determine the columns to select
  • Identify JOIN conditions
  • Apply WHERE filters
  • Add GROUP BY, ORDER BY, or LIMIT if needed
</reasoning>

<sql> -- Your SQL here </sql>""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=800, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text

This approach significantly reduces errors on queries involving multiple JOINs, subqueries, or complex aggregations.

Step 4: RAG for Large Database Schemas

When your database has dozens or hundreds of tables, you can't fit the entire schema in a prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

Here's how to implement it with VoyageAI embeddings:

import voyageai
import numpy as np
from typing import List, Dict

class SchemaVectorDB: def __init__(self, api_key: str): self.client = voyageai.Client(api_key=api_key) self.embeddings = [] self.schema_chunks = [] def index_schema(self, tables: Dict[str, Dict]): """Index table schemas as embedding vectors""" for table_name, columns in tables.items(): chunk = f"Table: {table_name}\n" for col_name, col_type in columns.items(): chunk += f" - {col_name} ({col_type})\n" self.schema_chunks.append(chunk) # Generate embeddings response = self.client.embed( self.schema_chunks, model="voyage-2" ) self.embeddings = response.embeddings def query_relevant_schema(self, user_query: str, top_k: int = 3) -> str: """Retrieve most relevant schema chunks for a query""" query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0] # Calculate cosine similarity similarities = [ np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb)) for emb in self.embeddings ] # Get top-k most relevant chunks top_indices = np.argsort(similarities)[-top_k:][::-1] relevant_schema = "\n".join([self.schema_chunks[i] for i in top_indices]) return relevant_schema

Usage

vector_db = SchemaVectorDB(api_key="your-voyage-api-key") vector_db.index_schema({ "employees": {"id": "INTEGER", "name": "TEXT", "department_id": "INTEGER", "salary": "REAL"}, "departments": {"id": "INTEGER", "name": "TEXT", "location": "TEXT"} })

relevant_schema = vector_db.query_relevant_schema("What's the average salary?")

Step 5: Self-Improvement Loop

This is where Claude truly shines. Instead of just generating SQL, let Claude execute it, check for errors, and fix its own mistakes:

def generate_and_refine_sql(user_query, schema, conn, max_iterations=3):
    current_sql = generate_sql(user_query, schema)
    
    for i in range(max_iterations):
        try:
            cursor = conn.cursor()
            cursor.execute(current_sql)
            results = cursor.fetchall()
            
            # Check if results make sense
            verification_prompt = f"""
Original Query: {user_query}
Generated SQL: {current_sql}
Results: {results[:5]}...

Does this SQL correctly answer the user's query? If not, explain what's wrong. """ response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=300, messages=[{"role": "user", "content": verification_prompt}] ) if "correct" in response.content[0].text.lower(): return current_sql, results except Exception as e: # SQL execution error - ask Claude to fix it error_prompt = f""" The following SQL generated an error: {current_sql}

Error: {str(e)}

Schema: {schema} Original Query: {user_query}

Please fix the SQL.""" response = client.messages.create( model="claude-3-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": error_prompt}] ) current_sql = response.content[0].text return current_sql, None

This self-improvement loop catches syntax errors, logical mistakes, and even performance issues before they reach your users.

Evaluating Your System

To ensure reliability, build an evaluation pipeline:

  • Create a test set of 20-50 query-SQL pairs
  • Measure accuracy by comparing generated SQL to expected SQL
  • Track execution success rate - does the SQL run without errors?
  • Monitor result correctness - does the output match expected results?
def evaluate_system(test_cases, conn):
    correct = 0
    for case in test_cases:
        generated_sql = generate_sql(case["query"], case["schema"])
        try:
            cursor = conn.cursor()
            cursor.execute(generated_sql)
            results = cursor.fetchall()
            if results == case["expected_results"]:
                correct += 1
        except:
            pass
    
    accuracy = correct / len(test_cases) * 100
    return f"Accuracy: {accuracy:.1f}%"

Key Takeaways

  • Start simple, iterate fast: Begin with a basic prompt, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows
  • RAG is essential for real-world databases: When schemas exceed prompt limits, use vector embeddings to dynamically retrieve relevant schema information
  • Self-improvement loops catch errors: Let Claude execute, verify, and fix its own SQL queries for dramatically better reliability
  • Include sample data in prompts: A few rows of real data help Claude understand data formats and produce more accurate queries
  • Always evaluate: Build a test set of query-SQL pairs and measure accuracy, execution success, and result correctness to track improvements over time