Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide walks through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Text-to-SQL is one of the most impactful applications of large language models in enterprise settings. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back.
Claude excels at this task because it understands context, interprets complex queries, and generates accurate SQL statements. This guide walks through building a production-ready Text-to-SQL system, from basic prompting to advanced techniques like RAG and self-improvement loops.
Why Text-to-SQL Matters
Before diving into implementation, let's understand why this capability is valuable:
- Accessibility: Non-technical team members can query databases without learning SQL syntax
- Efficiency: Data analysts can prototype queries faster using natural language
- Integration: Enables intuitive database interfaces in chatbots and applications
- Complexity: LLMs can generate multi-join queries, subqueries, and aggregations that take humans significant time to write
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will serve as our playground throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments (id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Sales', 'San Francisco'),
(3, 'Marketing', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 2, 75000, '2019-07-01'),
(3, 'Charlie Brown', 1, 88000, '2021-01-10'),
(4, 'Diana Prince', 3, 82000, '2022-06-20')
])
conn.commit()
Creating a Basic Text-to-SQL Prompt
A good Text-to-SQL prompt needs three elements: clear instructions, the user's query, and the database schema. Here's a minimal implementation:
import anthropic
client = anthropic.Anthropic()
def get_schema(cursor):
"""Extract schema information from the database."""
schema_parts = []
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
col_defs = [f"{col[1]} ({col[2]})" for col in columns]
schema_parts.append(f"Table: {table_name}\nColumns: {', '.join(col_defs)}")
return '\n\n'.join(schema_parts)
def generate_sql(user_query, schema):
prompt = f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Test it
schema = get_schema(cursor)
query = "Show me all employees in the Engineering department"
sql = generate_sql(query, schema)
print(f"Generated SQL: {sql}")
Execute and verify
cursor.execute(sql)
results = cursor.fetchall()
print(f"Results: {results}")
Improving with Few-Shot Examples
Basic prompts work, but including examples (few-shot learning) dramatically improves accuracy, especially for complex queries:
def generate_prompt_with_examples(user_query, schema):
examples = """
Examples:
Query: "List all departments"
SQL: SELECT * FROM departments;
Query: "Find employees hired after 2020"
SQL: SELECT * FROM employees WHERE hire_date > '2020-01-01';
Query: "Show average salary by department"
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
"""
prompt = f"""You are a SQL expert. Convert natural language queries to SQL.
{schema}
{examples}
User Query: {user_query}
SQL:"""
return prompt
Pro tip: Including a few rows of actual data alongside the schema can give Claude more context about data structure and content, leading to better results.
Chain-of-Thought Prompting for Complex Queries
For queries requiring multiple operations, chain-of-thought (CoT) prompting helps Claude reason step-by-step:
def generate_cot_prompt(user_query, schema):
prompt = f"""You are a SQL expert. Convert the user's query to SQL using step-by-step reasoning.
Database Schema:
{schema}
User Query: {user_query}
<thinking>
Let me break down what this query needs:
- Identify the tables involved
- Determine the columns needed
- Identify any joins, filters, or aggregations
- Construct the final SQL
</thinking>
SQL:"""
return prompt
This approach is particularly effective for queries like "Which department has the highest average salary and what is it?" where Claude needs to reason about joins, aggregations, and ordering.
Implementing RAG for Complex Database Schemas
Real-world databases can have hundreds of tables. Including the entire schema in every prompt is impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only relevant schema information.
First, set up a vector database for schema embeddings:
import voyageai
vo = voyageai.Client()
def build_schema_embeddings(cursor):
"""Create embeddings for each table in the database."""
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
table_descriptions = []
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns])
table_descriptions.append(f"Table {table_name}: {col_desc}")
# Generate embeddings
embeddings = vo.embed(table_descriptions, model="voyage-2").embeddings
return list(zip(table_descriptions, embeddings))
def retrieve_relevant_schema(query, schema_embeddings, top_k=2):
"""Retrieve the most relevant tables for a given query."""
query_embedding = vo.embed([query], model="voyage-2").embeddings[0]
# Simple cosine similarity (use proper vector DB in production)
similarities = []
for desc, emb in schema_embeddings:
similarity = sum(a*b for a,b in zip(query_embedding, emb))
similarities.append((similarity, desc))
similarities.sort(reverse=True)
return "\n\n".join([desc for _, desc in similarities[:top_k]])
Now integrate RAG into your prompt generation:
def generate_sql_with_rag(user_query, schema_embeddings):
relevant_schema = retrieve_relevant_schema(user_query, schema_embeddings)
prompt = f"""You are a SQL expert. Convert the user's query to SQL.
Relevant Database Schema:
{relevant_schema}
User Query: {user_query}
SQL:"""
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Building a Self-Improvement Loop
One of Claude's most powerful capabilities is self-correction. You can implement a loop where Claude executes its generated SQL, analyzes errors or results, and improves the query:
def text_to_sql_with_self_improvement(user_query, schema, cursor, max_iterations=3):
"""Generate SQL and iteratively improve it based on execution results."""
# Initial generation
sql = generate_sql(user_query, schema)
for i in range(max_iterations):
try:
cursor.execute(sql)
results = cursor.fetchall()
# Check if results make sense
validation_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Results: {results}
Are these results correct for the original query? If not, provide corrected SQL.
If correct, respond with "CORRECT".
"""
validation = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": validation_prompt}]
)
response_text = validation.content[0].text
if "CORRECT" in response_text:
return sql, results
else:
# Extract improved SQL
sql = response_text
except Exception as e:
# SQL execution error - ask Claude to fix it
error_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Error: {str(e)}
Fix the SQL query to resolve this error.
"""
fix_response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": error_prompt}]
)
sql = fix_response.content[0].text
return sql, None
This self-improvement loop handles:
- Syntax errors: Claude can fix malformed SQL
- Logic errors: Claude can adjust queries that return unexpected results
- Schema mismatches: Claude can adapt when it misremembers column names
Evaluation Framework
To ensure your Text-to-SQL system is reliable, implement a systematic evaluation:
def evaluate_text_to_sql(test_cases, cursor):
"""Evaluate system accuracy against a set of test cases."""
results = []
for test in test_cases:
natural_query = test["query"]
expected_sql = test["expected_sql"]
generated_sql = generate_sql(natural_query, get_schema(cursor))
# Compare results
cursor.execute(generated_sql)
generated_results = set(cursor.fetchall())
cursor.execute(expected_sql)
expected_results = set(cursor.fetchall())
is_correct = generated_results == expected_results
results.append({
"query": natural_query,
"correct": is_correct,
"generated_sql": generated_sql,
"expected_sql": expected_sql
})
accuracy = sum(r["correct"] for r in results) / len(results)
return accuracy, results
Key Takeaways
- Start simple, iterate fast: Begin with basic prompting and schema inclusion, then layer in few-shot examples, chain-of-thought, and RAG as complexity grows
- RAG is essential for production: For databases with more than 5-10 tables, dynamic schema retrieval is necessary for both accuracy and cost efficiency
- Self-improvement loops catch errors: Claude's ability to analyze and fix its own SQL output dramatically improves reliability in production systems
- Include data samples: Adding a few rows of actual data alongside schema information helps Claude generate more accurate queries
- Evaluate systematically: Build a test suite of natural language queries with expected SQL outputs to track performance regressions as you iterate