Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops for error handling.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Text-to-SQL is one of the most impactful applications of large language models (LLMs) in enterprise settings. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.
This guide walks you through building a production-ready Text-to-SQL system using Claude. You'll learn everything from basic prompting to advanced techniques like Retrieval Augmented Generation (RAG) and self-improvement loops.
Why Text-to-SQL Matters
Before diving into the implementation, it's worth understanding why Text-to-SQL is so valuable:
- Democratizing Data Access: Business analysts, product managers, and executives can query databases without knowing SQL syntax.
- Accelerating Development: Data scientists can prototype complex queries in seconds rather than minutes or hours.
- Enabling Conversational Interfaces: Chatbots and virtual assistants can answer data-driven questions naturally.
- Handling Complexity: LLMs can generate queries with multiple joins, subqueries, window functions, and aggregations that would take humans significant time to write correctly.
Setting Up Your Environment
Let's start by creating a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments (id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Smith', 1, 95000, '2020-03-15'),
(2, 'Bob Johnson', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 88000, '2022-01-10'),
(5, 'Eve Wilson', 2, 65000, '2023-04-05')
])
conn.commit()
Creating a Basic Text-to-SQL Prompt
The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt should include:
- Clear instructions about the task
- The user's natural language query
- The database schema so Claude understands the available tables and columns
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def get_schema(conn):
"""Extract schema information from the database."""
cursor = conn.cursor()
# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
schema = ""
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
schema += f"Table: {table_name}\n"
for col in columns:
schema += f" - {col[1]} ({col[2]})\n"
schema += "\n"
return schema
def generate_basic_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the following natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
schema = get_schema(conn)
user_query = "Show me all employees in the Engineering department with their salaries"
prompt = generate_basic_prompt(user_query, schema)
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
sql_query = response.content[0].text
print(f"Generated SQL: {sql_query}")
Execute and display results
cursor = conn.cursor()
cursor.execute(sql_query)
results = cursor.fetchall()
print(f"Results: {results}")
Improving with Few-Shot Examples
A basic prompt works, but you'll get better results by including examples. This technique, called few-shot learning, helps Claude understand the expected output format and reasoning pattern.
def generate_few_shot_prompt(user_query, schema):
examples = """
Examples:
User Query: "List all departments"
SQL: SELECT * FROM departments;
User Query: "Find employees hired after 2021"
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';
User Query: "What is the average salary by department?"
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
"""
return f"""You are a SQL expert. Convert natural language queries into valid SQL.
Database Schema:
{schema}
{examples}
User Query: {user_query}
SQL:"""
Pro Tip: Consider including a few rows of actual data alongside the schema. This gives Claude context about data formats, NULL values, and typical values in each column, leading to more accurate queries.
Chain-of-Thought Prompting for Complex Queries
For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating the final SQL.
def generate_cot_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the following natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
Let's think through this step-by-step:
1) What tables do I need to query?
2) What columns do I need to select?
3) What conditions (WHERE clauses) are needed?
4) Do I need any aggregations (GROUP BY, HAVING)?
5) What is the final SQL query?
<thinking>
[Your step-by-step reasoning]
</thinking>
<sql>
[Your final SQL query]
</sql>"""
The XML tags (<thinking> and <sql>) help parse Claude's output programmatically, separating the reasoning from the final answer.
Implementing RAG for Large Database Schemas
As databases grow to dozens or hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.
Here's how to implement RAG for Text-to-SQL:
import voyageai
from typing import List, Dict
import numpy as np
class SchemaVectorDB:
def __init__(self, api_key: str):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.metadata = []
def add_table(self, table_name: str, columns: List[Dict], sample_data: List[str] = None):
"""Add a table's schema information to the vector database."""
# Create a rich text description of the table
description = f"Table: {table_name}\n"
for col in columns:
description += f" - {col['name']} ({col['type']}): {col.get('description', '')}\n"
if sample_data:
description += "\nSample data:\n"
for row in sample_data[:3]: # Include up to 3 sample rows
description += f" {row}\n"
# Generate embedding
embedding = self.client.embed([description], model="voyage-2").embeddings[0]
self.embeddings.append(embedding)
self.metadata.append({
"table_name": table_name,
"description": description,
"columns": columns
})
def query(self, user_query: str, top_k: int = 3) -> List[Dict]:
"""Retrieve the most relevant schema information for a user query."""
query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0]
# Calculate cosine similarity
similarities = []
for emb in self.embeddings:
similarity = np.dot(query_embedding, emb) / (
np.linalg.norm(query_embedding) * np.linalg.norm(emb)
)
similarities.append(similarity)
# Get top-k most relevant tables
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [self.metadata[i] for i in top_indices]
Usage example
vector_db = SchemaVectorDB(api_key="your-voyage-api-key")
Add tables to the vector database
vector_db.add_table(
table_name="employees",
columns=[
{"name": "id", "type": "INTEGER", "description": "Primary key"},
{"name": "name", "type": "TEXT", "description": "Employee full name"},
{"name": "department_id", "type": "INTEGER", "description": "Foreign key to departments"},
{"name": "salary", "type": "REAL", "description": "Annual salary in USD"},
{"name": "hire_date", "type": "DATE", "description": "Date employee was hired"}
]
)
vector_db.add_table(
table_name="departments",
columns=[
{"name": "id", "type": "INTEGER", "description": "Primary key"},
{"name": "name", "type": "TEXT", "description": "Department name"},
{"name": "location", "type": "TEXT", "description": "Office location"}
]
)
Retrieve relevant schema for a query
user_query = "Show me the highest paid employee in each department"
relevant_schema = vector_db.query(user_query, top_k=2)
Build prompt with only relevant schema
schema_context = "\n".join([s["description"] for s in relevant_schema])
prompt = f"""You are a SQL expert. Convert the following query into SQL.
Relevant Database Schema:
{schema_context}
User Query: {user_query}
SQL:"""
Implementing Query Self-Improvement
One of Claude's most powerful capabilities is self-correction. You can implement a loop where Claude:
- Generates a SQL query
- Executes it against the database
- Analyzes the results or catches errors
- Improves the query if needed
def text_to_sql_with_self_improvement(user_query, schema, conn, max_iterations=3):
"""
Generate SQL, execute it, and self-improve if there are errors or unexpected results.
"""
prompt = f"""You are a SQL expert. Convert this query to SQL.
Schema:
{schema}
Query: {user_query}
SQL:"""
for iteration in range(max_iterations):
# Generate SQL
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
sql_query = response.content[0].text
# Try to execute
try:
cursor = conn.cursor()
cursor.execute(sql_query)
results = cursor.fetchall()
# Check if results make sense
if len(results) == 0:
# No results - might be an issue with the query
feedback = f"The query returned 0 rows. Here's the query I tried: {sql_query}\n\nCan you fix this?"
prompt = f"{prompt}\n\nPrevious attempt: {sql_query}\nFeedback: {feedback}\n\nImproved SQL:"
continue
return sql_query, results
except Exception as e:
# SQL error - ask Claude to fix it
feedback = f"The query failed with error: {str(e)}\n\nHere's the query I tried: {sql_query}"
prompt = f"{prompt}\n\nPrevious attempt: {sql_query}\nError: {feedback}\n\nFixed SQL:"
return None, "Failed after maximum iterations"
Test it
sql, results = text_to_sql_with_self_improvement(
"Find departments with average salary above 80000",
schema,
conn
)
print(f"Final SQL: {sql}")
print(f"Results: {results}")
Evaluating Your Text-to-SQL System
To ensure your system is reliable, you need a robust evaluation framework. Here's a simple approach:
def evaluate_text_to_sql(test_cases, text_to_sql_function, conn):
"""
Evaluate a Text-to-SQL system against a set of test cases.
test_cases: List of dicts with 'query' and 'expected_sql' or 'expected_results'
"""
results = []
for test in test_cases:
generated_sql, actual_results = text_to_sql_function(test['query'], conn)
# Compare with expected results
expected_results = test.get('expected_results')
match = actual_results == expected_results if expected_results else None
results.append({
'query': test['query'],
'generated_sql': generated_sql,
'actual_results': actual_results,
'expected_results': expected_results,
'match': match
})
accuracy = sum(1 for r in results if r['match']) / len(results)
return results, accuracy
Key Takeaways
- Start simple, iterate fast: Begin with a basic prompt and schema, then add few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
- RAG is essential for production systems: For databases with more than 10-15 tables, dynamic schema retrieval reduces token usage and improves accuracy.
- Self-improvement loops catch errors: Let Claude execute its own SQL, analyze results, and fix mistakes automatically—this dramatically improves reliability.
- Include sample data in prompts: A few rows of real data alongside the schema helps Claude understand data formats and avoid common mistakes.
- Evaluate rigorously: Build a test suite with known queries and expected results to measure accuracy and catch regressions when you modify prompts or schemas.