Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops for accurate SQL generation.
This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Imagine giving your non-technical team members the ability to query complex databases just by asking questions in plain English. That's the promise of Text-to-SQL, and Claude is exceptionally good at it. This guide walks you through building a production-ready Text-to-SQL system, from basic prompts to advanced self-improvement loops.
Why Text-to-SQL Matters
Text-to-SQL converts natural language questions into structured SQL queries. It's a game-changer for organizations because:
- Democratizes data access: Business analysts, product managers, and executives can query databases without knowing SQL syntax
- Boosts productivity: Data analysts can prototype complex queries in seconds instead of minutes
- Enables intuitive interfaces: Power chatbots and internal tools with natural language database interaction
- Handles complexity: Claude can generate queries with multiple JOINs, subqueries, and aggregations that would take humans significant time to write correctly
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 88000, '2022-01-10')
])
conn.commit()
Step 1: The Basic Text-to-SQL Prompt
A good Text-to-SQL prompt needs three things: clear instructions, the database schema, and the user's question. Here's a minimal but effective approach:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def generate_sql(user_query, schema):
prompt = f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL statement.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Example usage
schema = """
Table: departments
Columns: id (INTEGER), name (TEXT), location (TEXT)
Table: employees
Columns: id (INTEGER), name (TEXT), department_id (INTEGER), salary (REAL), hire_date (DATE)
Foreign Key: department_id -> departments.id
"""
query = "Show me all employees in the Engineering department"
sql = generate_sql(query, schema)
print(sql)
Output: SELECT * FROM employees WHERE department_id = 1;
Step 2: Improving with Few-Shot Examples
Basic prompts work, but adding examples (few-shot learning) dramatically improves accuracy, especially for complex queries. Let's enhance our function:
def generate_sql_with_examples(user_query, schema):
examples = """
Examples:
User: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
User: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
User: Who was hired most recently?
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
prompt = f"""You are a SQL expert. Given the schema and examples below, convert the user's query to SQL.
{schema}
{examples}
User Query: {user_query}
SQL:"""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data formats and values, leading to more accurate queries.
Step 3: Chain-of-Thought for Complex Queries
For multi-step queries, chain-of-thought prompting helps Claude reason through the problem. Use XML tags to structure the reasoning:
def generate_sql_cot(user_query, schema):
prompt = f"""You are a SQL expert. Convert the user's query to SQL using step-by-step reasoning.
{schema}
User Query: {user_query}
<reasoning>
- Identify the tables needed
- Determine the columns to select
- Identify JOIN conditions
- Apply WHERE filters
- Add GROUP BY, ORDER BY, or LIMIT if needed
</reasoning>
<sql>
-- Your SQL here
</sql>"""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=800,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
This approach significantly reduces errors on queries involving multiple JOINs, subqueries, or complex aggregations.
Step 4: RAG for Large Database Schemas
When your database has dozens or hundreds of tables, you can't fit the entire schema in a prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.
Here's how to implement it with VoyageAI embeddings:
import voyageai
import numpy as np
from typing import List, Dict
class SchemaVectorDB:
def __init__(self, api_key: str):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.schema_chunks = []
def index_schema(self, tables: Dict[str, Dict]):
"""Index table schemas as embedding vectors"""
for table_name, columns in tables.items():
chunk = f"Table: {table_name}\n"
for col_name, col_type in columns.items():
chunk += f" - {col_name} ({col_type})\n"
self.schema_chunks.append(chunk)
# Generate embeddings
response = self.client.embed(
self.schema_chunks,
model="voyage-2"
)
self.embeddings = response.embeddings
def query_relevant_schema(self, user_query: str, top_k: int = 3) -> str:
"""Retrieve most relevant schema chunks for a query"""
query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0]
# Calculate cosine similarity
similarities = [
np.dot(query_embedding, emb) /
(np.linalg.norm(query_embedding) * np.linalg.norm(emb))
for emb in self.embeddings
]
# Get top-k most relevant chunks
top_indices = np.argsort(similarities)[-top_k:][::-1]
relevant_schema = "\n".join([self.schema_chunks[i] for i in top_indices])
return relevant_schema
Usage
vector_db = SchemaVectorDB(api_key="your-voyage-api-key")
vector_db.index_schema({
"employees": {"id": "INTEGER", "name": "TEXT", "department_id": "INTEGER", "salary": "REAL"},
"departments": {"id": "INTEGER", "name": "TEXT", "location": "TEXT"}
})
relevant_schema = vector_db.query_relevant_schema("What's the average salary?")
Step 5: Self-Improvement Loop
This is where Claude truly shines. Instead of just generating SQL, let Claude execute it, check for errors, and fix its own mistakes:
def generate_and_refine_sql(user_query, schema, conn, max_iterations=3):
current_sql = generate_sql(user_query, schema)
for i in range(max_iterations):
try:
cursor = conn.cursor()
cursor.execute(current_sql)
results = cursor.fetchall()
# Check if results make sense
verification_prompt = f"""
Original Query: {user_query}
Generated SQL: {current_sql}
Results: {results[:5]}...
Does this SQL correctly answer the user's query? If not, explain what's wrong.
"""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": verification_prompt}]
)
if "correct" in response.content[0].text.lower():
return current_sql, results
except Exception as e:
# SQL execution error - ask Claude to fix it
error_prompt = f"""
The following SQL generated an error:
{current_sql}
Error: {str(e)}
Schema: {schema}
Original Query: {user_query}
Please fix the SQL."""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": error_prompt}]
)
current_sql = response.content[0].text
return current_sql, None
This self-improvement loop catches syntax errors, logical mistakes, and even performance issues before they reach your users.
Evaluating Your System
To ensure reliability, build an evaluation pipeline:
- Create a test set of 20-50 query-SQL pairs
- Measure accuracy by comparing generated SQL to expected SQL
- Track execution success rate - does the SQL run without errors?
- Monitor result correctness - does the output match expected results?
def evaluate_system(test_cases, conn):
correct = 0
for case in test_cases:
generated_sql = generate_sql(case["query"], case["schema"])
try:
cursor = conn.cursor()
cursor.execute(generated_sql)
results = cursor.fetchall()
if results == case["expected_results"]:
correct += 1
except:
pass
accuracy = correct / len(test_cases) * 100
return f"Accuracy: {accuracy:.1f}%"
Key Takeaways
- Start simple, iterate fast: Begin with a basic prompt, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows
- RAG is essential for real-world databases: When schemas exceed prompt limits, use vector embeddings to dynamically retrieve relevant schema information
- Self-improvement loops catch errors: Let Claude execute, verify, and fix its own SQL queries for dramatically better reliability
- Include sample data in prompts: A few rows of real data help Claude understand data formats and produce more accurate queries
- Always evaluate: Build a test set of query-SQL pairs and measure accuracy, execution success, and result correctness to track improvements over time