Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide walks through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops. You'll learn practical techniques to convert natural language to accurate SQL queries.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Text-to-SQL is one of the most impactful applications of large language models. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct code.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop that catches and fixes errors automatically.
Why Text-to-SQL Matters
Before diving into the implementation, let's understand why this capability is so valuable:
- Democratizing Data Access: Non-technical stakeholders can query databases without waiting for engineers or learning SQL syntax.
- Rapid Prototyping: Data analysts can quickly iterate on query ideas using natural language before refining the SQL.
- Chatbot Integration: Enable conversational database interactions in customer support, internal tools, or analytics dashboards.
- Complex Query Generation: LLMs can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write correctly.
Prerequisites
To follow along, you'll need:
- A Claude API key (from Anthropic)
- Python 3.8+ installed
- Basic familiarity with SQL and Python
- An SQLite database (or any database you want to query)
Setting Up Your Environment
First, install the required packages:
pip install anthropic sqlite3 voyageai
Now, let's create a test database with two tables: employees and departments. This will be our playground throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 88000, '2022-01-10'),
(5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()
Step 1: The Basic Text-to-SQL Prompt
A good Text-to-SQL prompt needs three things: clear instructions, the database schema, and the user's query. Here's a minimal implementation:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def get_schema(cursor):
"""Extract schema information from the database."""
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
schemas = cursor.fetchall()
return "\n".join([s[0] for s in schemas if s[0]])
def basic_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
schema = get_schema(cursor)
user_query = "Show me all employees in the Engineering department"
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": basic_prompt(user_query, schema)}]
)
print(response.content[0].text)
This works, but it's brittle. Let's improve it.
Step 2: Improving with Few-Shot Examples
Few-shot learning dramatically improves accuracy by showing Claude examples of input-output pairs. This helps the model understand the expected format and reasoning pattern.
def few_shot_prompt(user_query, schema):
examples = """
Example 1:
User Query: "List all departments in Chicago"
SQL: SELECT * FROM departments WHERE location = 'Chicago';
Example 2:
User Query: "What is the average salary by department?"
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
Example 3:
User Query: "Who was hired most recently?"
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
Here are some examples:
{examples}
User Query: {user_query}
SQL:"""
Pro Tip: Include a few rows of actual data in your prompt. This gives Claude context about data types and values, leading to more accurate queries.
Step 3: Chain-of-Thought Prompting for Complex Queries
For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step. Use XML tags to structure the reasoning process:
def cot_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Let's think through this step by step:
<thinking>
- Identify the tables needed
- Identify the columns needed
- Determine joins and conditions
- Consider aggregations or ordering
- Write the final SQL
</thinking>
SQL:"""
This approach is particularly effective when you need Claude to handle ambiguous queries or complex business logic.
Step 4: RAG for Large Database Schemas
When your database has dozens or hundreds of tables, including the entire schema in every prompt is impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.
First, create embeddings for your schema using VoyageAI:
import voyageai
vo = voyageai.Client(api_key="your-voyage-api-key")
def create_schema_embeddings(cursor):
"""Create embeddings for each table in the database."""
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
schema_docs = []
for table in tables:
cursor.execute(f"PRAGMA table_info({table[0]})")
columns = cursor.fetchall()
col_desc = ", ".join([f"{col[1]} ({col[2]})" for col in columns])
schema_docs.append(f"Table: {table[0]}, Columns: {col_desc}")
# Get embeddings
embeddings = vo.embed(schema_docs, model="voyage-2").embeddings
return dict(zip([t[0] for t in tables], embeddings))
Now, when a user asks a question, retrieve the most relevant tables:
def retrieve_relevant_schema(user_query, schema_embeddings, cursor):
query_embedding = vo.embed([user_query], model="voyage-2").embeddings[0]
# Simple cosine similarity (use a proper vector DB in production)
similarities = {}
for table_name, table_embedding in schema_embeddings.items():
similarity = sum(a*b for a,b in zip(query_embedding, table_embedding))
similarities[table_name] = similarity
# Get top 2 most relevant tables
relevant_tables = sorted(similarities, key=similarities.get, reverse=True)[:2]
# Fetch their schemas
schemas = []
for table in relevant_tables:
cursor.execute(f"SELECT sql FROM sqlite_master WHERE name='{table}'")
schemas.append(cursor.fetchone()[0])
return "\n".join(schemas)
This approach scales to databases with hundreds of tables while keeping token usage low.
Step 5: Self-Improvement Loop
The most powerful technique is giving Claude the ability to execute its own SQL, analyze results, and fix errors. This creates a feedback loop that dramatically improves reliability.
def self_improving_query(user_query, cursor, max_iterations=3):
schema = get_schema(cursor)
for i in range(max_iterations):
# Generate SQL
response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": cot_prompt(user_query, schema)}]
)
sql = response.content[0].text.strip()
# Try to execute
try:
cursor.execute(sql)
results = cursor.fetchall()
# Ask Claude to verify the results
verification_prompt = f"""The following SQL was generated for the query: "{user_query}"
SQL: {sql}
Results: {results}
Are these results correct? If not, explain what's wrong and provide the corrected SQL."""
verify_response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": verification_prompt}]
)
# If Claude says it's correct, return
if "correct" in verify_response.content[0].text.lower():
return sql, results
# Otherwise, use the corrected SQL in the next iteration
sql = verify_response.content[0].text
except Exception as e:
# If there's an error, ask Claude to fix it
error_prompt = f"""The SQL query:
{sql}
Produced this error:
{str(e)}
Please fix the SQL query."""
fix_response = client.messages.create(
model="claude-3-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": error_prompt}]
)
sql = fix_response.content[0].text
return sql, "Max iterations reached"
This self-improvement loop handles:
- Syntax errors (missing commas, wrong keywords)
- Logic errors (wrong joins, incorrect aggregations)
- Schema mismatches (referencing non-existent columns)
Evaluating Your Text-to-SQL System
To ensure quality, build an evaluation pipeline:
- Create a test set of 50-100 natural language queries with expected SQL
- Measure exact match (query matches expected SQL exactly)
- Measure execution match (query produces same results as expected SQL)
- Track error rate (percentage of queries that fail to execute)
def evaluate(test_cases, cursor):
exact_matches = 0
execution_matches = 0
errors = 0
for nl_query, expected_sql, expected_results in test_cases:
try:
generated_sql, results = self_improving_query(nl_query, cursor)
if generated_sql.strip() == expected_sql.strip():
exact_matches += 1
if results == expected_results:
execution_matches += 1
except Exception:
errors += 1
total = len(test_cases)
print(f"Exact Match: {exact_matches/total:.1%}")
print(f"Execution Match: {execution_matches/total:.1%}")
print(f"Error Rate: {errors/total:.1%}")
Best Practices for Production
- Always validate SQL before executing against production databases. Use a read-only connection or wrap in a transaction that rolls back.
- Set strict token limits to prevent runaway queries.
- Log all queries and results for debugging and improvement.
- Use temperature 0 for deterministic SQL generation.
- Consider adding a "human in the loop" for destructive operations (DELETE, UPDATE, DROP).
Key Takeaways
- Start simple, iterate fast: Begin with a basic prompt and add complexity (few-shot, CoT, RAG) as needed based on your specific use case.
- RAG is essential for real-world databases: When your schema exceeds a few tables, dynamic retrieval keeps prompts efficient and accurate.
- Self-improvement loops dramatically boost reliability: Let Claude execute, verify, and fix its own SQL. This catches most errors automatically.
- Always evaluate: Build a test set and measure exact match, execution match, and error rates to track improvements over time.
- Safety first: Never execute generated SQL directly on production databases without validation and proper access controls.