Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Imagine giving your non-technical team members the ability to query complex databases using plain English. No SQL training required. That's the promise of Text-to-SQL systems, and with Claude's advanced language understanding, building one is more accessible than ever.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that lets Claude fix its own mistakes.
Why Text-to-SQL Matters
Before diving into the code, let's understand why this capability is transformative:
- Democratize data access: Business analysts, product managers, and executives can query databases without waiting for engineering support
- Accelerate prototyping: Data scientists can quickly explore data using natural language before refining SQL
- Build intuitive interfaces: Integrate natural language querying into chatbots, dashboards, and internal tools
- Handle complexity: Claude can generate SQL with multiple joins, subqueries, and aggregations that would take humans significant time to write
Prerequisites
To follow along, you'll need:
- An Anthropic API key
- Python 3.8+
- Basic familiarity with SQL and Python
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments (id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 88000, '2022-01-10'),
(5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()
Step 1: Building a Basic Text-to-SQL Prompt
The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt must include three elements:
- Clear instructions
- The database schema
- The user's natural language query
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def get_schema():
"""Extract schema information from the database."""
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
schemas = cursor.fetchall()
return "\n".join([s[0] for s in schemas])
def basic_prompt(user_query, schema):
return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
schema = get_schema()
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": basic_prompt("Show me all employees in Engineering", schema)}]
)
print(response.content[0].text)
Output: SELECT * FROM employees WHERE department_id = 1;
Step 2: Improving with Few-Shot Examples
Basic prompts work, but they can struggle with ambiguity. Adding examples—a technique called few-shot learning—dramatically improves accuracy.
def few_shot_prompt(user_query, schema):
examples = """
Example 1:
User: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
Example 3:
User: Who was hired most recently?
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
return f"""You are a SQL expert. Convert natural language queries to SQL.
{schema}
Here are examples of how to convert queries:
{examples}
User Query: {user_query}
SQL:"""
Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data types and formats, reducing errors with date parsing or string matching.
Step 3: Chain-of-Thought Prompting for Complex Queries
For multi-step queries, chain-of-thought prompting helps Claude reason through the problem before generating SQL. Use XML tags to structure the reasoning process.
def chain_of_thought_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the user's query to SQL.
Database Schema:
{schema}
User Query: {user_query}
Let's think through this step by step:
<reasoning>
- What tables do I need?
- What columns should I select?
- What conditions (WHERE clauses) apply?
- Do I need aggregations (GROUP BY, HAVING)?
- What order should results be in?
</reasoning>
<sql>
-- Write your SQL here
</sql>"""
This approach is particularly effective for queries like "Find the department with the highest average salary" or "Show me employees hired in the last year along with their department names."
Step 4: Implementing RAG for Large Schemas
Real-world databases often have hundreds of tables. Including the entire schema in every prompt wastes tokens and confuses the model. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only relevant schema information.
import voyageai
from typing import List, Dict
import numpy as np
class SchemaRetriever:
def __init__(self, api_key: str):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.table_info = []
def index_schema(self, conn):
"""Index all tables and their schemas."""
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
for (table_name,) in tables:
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
# Create a description of this table
col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns])
description = f"Table '{table_name}' with columns: {col_desc}"
self.table_info.append({
"name": table_name,
"description": description,
"columns": columns
})
# Generate embedding
result = self.client.embed([description], model="voyage-2")
self.embeddings.append(result.embeddings[0])
def retrieve_relevant_tables(self, query: str, top_k: int = 3) -> List[Dict]:
"""Find the most relevant tables for a given query."""
query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
# Compute cosine similarity
similarities = [
np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
for emb in self.embeddings
]
# Get top-k indices
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [self.table_info[i] for i in top_indices]
Usage
retriever = SchemaRetriever(api_key="your-voyage-api-key")
retriever.index_schema(conn)
relevant_tables = retriever.retrieve_relevant_tables("Show me employee salaries")
for table in relevant_tables:
print(table["description"])
Now, instead of passing the entire schema, you dynamically build your prompt with only the relevant tables:
def rag_prompt(user_query, retriever):
relevant = retriever.retrieve_relevant_tables(user_query)
schema_context = "\n".join([t["description"] for t in relevant])
return f"""Given these database tables:
{schema_context}
Convert this query to SQL: {user_query}
SQL:"""
Step 5: Building a Self-Improvement Loop
Even the best prompts can produce incorrect SQL. A self-improvement loop lets Claude execute its generated SQL, catch errors, and fix them automatically.
def execute_and_improve(user_query, conn, max_iterations=3):
"""Generate SQL, execute it, and improve if errors occur."""
schema = get_schema()
for iteration in range(max_iterations):
# Generate SQL
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": basic_prompt(user_query, schema)}]
)
sql = response.content[0].text.strip()
print(f"Iteration {iteration + 1}: Generated SQL: {sql}")
try:
cursor = conn.cursor()
cursor.execute(sql)
results = cursor.fetchall()
# If we get here, SQL executed successfully
if len(results) == 0:
# Check if results are empty but query should return data
verification_prompt = f"""The SQL query '{sql}' returned no results for the user query '{user_query}'.
Schema: {schema}
Is this expected? If not, suggest a corrected SQL query."""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": verification_prompt}]
)
# Check if Claude suggests a fix
if "corrected" in response.content[0].text.lower():
sql = response.content[0].text.split("SQL:")[-1].strip()
continue
return {"success": True, "sql": sql, "results": results}
except Exception as e:
error_msg = str(e)
print(f"Error: {error_msg}")
# Feed error back to Claude for fixing
fix_prompt = f"""The SQL query you generated had an error:
User Query: {user_query}
Your SQL: {sql}
Error: {error_msg}
Database Schema:
{schema}
Please provide a corrected SQL query."""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": fix_prompt}]
)
schema = response.content[0].text # Update schema with corrected query context
return {"success": False, "error": "Max iterations reached without successful execution"}
Test with a query that might cause errors
result = execute_and_improve("Show me employees with salary above average", conn)
print(result)
This self-improvement loop handles:
- Syntax errors: Misspelled table names, wrong column references
- Logical errors: Queries that run but return unexpected results
- Empty results: Queries that should return data but don't
Evaluation: Measuring Your System's Performance
To ensure your Text-to-SQL system is reliable, you need to evaluate it systematically:
- Execution Accuracy: Does the generated SQL run without errors?
- Result Correctness: Does the output match expected results?
- Semantic Equivalence: Is the SQL logically equivalent to the user's intent?
test_cases = [
{"query": "List all employees", "expected_sql": "SELECT * FROM employees"},
{"query": "Show me departments in San Francisco", "expected_sql": "SELECT * FROM departments WHERE location = 'San Francisco'"},
# Add more test cases
]
def evaluate_system(test_cases, generate_sql_fn):
correct = 0
for case in test_cases:
generated = generate_sql_fn(case["query"])
if generated.strip().lower() == case["expected_sql"].strip().lower():
correct += 1
return correct / len(test_cases)
Key Takeaways
- Start simple, iterate fast: Begin with a basic prompt and schema, then layer in few-shot examples, chain-of-thought reasoning, and RAG as needed
- RAG is essential for production: For databases with more than 10-15 tables, dynamic schema retrieval dramatically improves accuracy and reduces token costs
- Self-improvement loops catch errors: Let Claude execute its own SQL and fix mistakes automatically—this is crucial for production reliability
- Include sample data in prompts: A few rows of real data help Claude understand data formats and avoid common pitfalls with dates, strings, and NULL values
- Evaluate systematically: Build a test suite of query-SQL pairs and track accuracy metrics to guide your prompt engineering efforts