Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Introduction
Text-to-SQL is one of the most impactful applications of large language models in enterprise settings. It bridges the gap between natural language and structured data, allowing non-technical users to query databases without knowing SQL syntax. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with basic prompting and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for complex schemas, and a self-improvement loop that lets Claude fix its own mistakes.
Why Text-to-SQL Matters
Before diving into implementation, let's understand the value proposition:
- Accessibility: Business analysts, product managers, and executives can query databases directly without waiting for engineering support.
- Efficiency: Data professionals can prototype complex queries in seconds rather than minutes.
- Integration: Chatbots and internal tools can offer natural language database interfaces.
- Complexity handling: Claude can generate queries with multiple JOINs, subqueries, window functions, and aggregations that would take humans significant time to write correctly.
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.
import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 88000, '2022-01-10'),
(5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()
Step 1: Building a Basic Text-to-SQL Prompt
The foundation of any Text-to-SQL system is a well-structured prompt. Your prompt must include:
- Clear instructions defining the task
- The database schema (table names, columns, data types, relationships)
- The user's natural language query
def get_schema(conn):
cursor = conn.cursor()
schema = []
# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name});")
columns = cursor.fetchall()
col_info = [f" - {col[1]} ({col[2]})" for col in columns]
schema.append(f"Table: {table_name}\n" + "\n".join(col_info))
return "\n\n".join(schema)
schema = get_schema(conn)
prompt = f"""You are a SQL expert. Convert the following natural language query into a valid SQLite SQL query.
Database Schema:
{schema}
User Query: "Show me all employees in the Engineering department"
Generate only the SQL query, no explanation."""
Step 2: Improving with Few-Shot Examples
A basic prompt works, but few-shot learning dramatically improves accuracy. By providing examples of input-output pairs, Claude learns the expected format and reasoning pattern.
def generate_prompt_with_examples(query, schema):
examples = """
Example 1:
User Query: "List all departments"
SQL: SELECT * FROM departments;
Example 2:
User Query: "Find employees hired after 2021"
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';
Example 3:
User Query: "What is the average salary per department?"
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
"""
return f"""You are a SQL expert. Convert natural language to SQLite SQL.
Database Schema:
{schema}
{examples}
User Query: "{query}"
SQL:"""
Pro tip: Consider including a few rows of actual data in your prompt. This gives Claude context about data distribution and values, which helps with queries involving specific criteria.
Step 3: Chain-of-Thought Prompting for Complex Queries
For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating SQL.
def generate_cot_prompt(query, schema):
return f"""You are a SQL expert. Convert natural language to SQLite SQL.
Database Schema:
{schema}
User Query: "{query}"
Let's think through this step-by-step:
<reasoning>
- What tables do I need?
- What columns should I select?
- What conditions (WHERE clause) are needed?
- Do I need GROUP BY, ORDER BY, or HAVING?
- What JOIN conditions are required?
</reasoning>
Now, generate the SQL query:
<sql>
-- Your SQL here
</sql>"""
This structured approach forces Claude to decompose the problem, leading to more accurate SQL generation for complex queries.
Step 4: Implementing RAG for Large Schemas
Real-world databases often have hundreds of tables. Including the entire schema in every prompt is impractical and wastes tokens. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.
Here's how to implement it with VoyageAI embeddings:
import voyageai
from typing import List, Dict
import numpy as np
class SchemaRetriever:
def __init__(self, api_key: str):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.schema_chunks = []
def index_schema(self, tables: Dict[str, List[str]]):
"""Index all tables and their columns."""
for table_name, columns in tables.items():
chunk = f"Table: {table_name}\nColumns: {', '.join(columns)}"
self.schema_chunks.append(chunk)
# Generate embeddings for all chunks
response = self.client.embed(
self.schema_chunks,
model="voyage-2"
)
self.embeddings = response.embeddings
def retrieve_relevant_schema(self, query: str, top_k: int = 3) -> str:
"""Retrieve the most relevant schema chunks for a query."""
# Embed the query
query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
# Compute cosine similarity
similarities = [
np.dot(query_embedding, emb) /
(np.linalg.norm(query_embedding) * np.linalg.norm(emb))
for emb in self.embeddings
]
# Get top-k most relevant chunks
top_indices = np.argsort(similarities)[-top_k:][::-1]
relevant_schema = "\n\n".join([self.schema_chunks[i] for i in top_indices])
return relevant_schema
Now your prompt only includes the most relevant tables and columns, keeping token usage efficient while maintaining accuracy.
Step 5: Building a Self-Improvement Loop
One of Claude's most powerful capabilities is self-correction. You can implement a loop where Claude:
- Generates SQL
- Executes it against the database
- Analyzes the results or errors
- Improves the query if needed
def text_to_sql_with_self_improvement(query, conn, max_iterations=3):
schema = get_schema(conn)
for iteration in range(max_iterations):
# Generate SQL
prompt = generate_cot_prompt(query, schema)
response = claude_api.complete(prompt) # Your Claude API call here
sql = extract_sql_from_response(response)
# Try to execute
try:
cursor = conn.cursor()
cursor.execute(sql)
results = cursor.fetchall()
# Check if results make sense
if len(results) == 0:
# No results - maybe the query is wrong
correction_prompt = f"""
The SQL query:
{sql}
Returned 0 results. This might be incorrect.
User's original query: "{query}"
Please analyze and fix the query. Consider:
- Are the JOIN conditions correct?
- Are the WHERE conditions too restrictive?
- Are you using the right columns?
Generate an improved SQL query:"""
response = claude_api.complete(correction_prompt)
sql = extract_sql_from_response(response)
continue
return sql, results
except Exception as e:
error_msg = str(e)
# Ask Claude to fix the error
fix_prompt = f"""
The SQL query:
{sql}
Produced this error:
{error_msg}
User's original query: "{query}"
Please fix the SQL query to resolve this error.
Corrected SQL:"""
response = claude_api.complete(fix_prompt)
sql = extract_sql_from_response(response)
return None, "Failed after max iterations"
This self-improvement loop is invaluable in production because:
- It catches syntax errors automatically
- It handles edge cases like empty results
- It adapts to schema changes without manual intervention
Evaluation: Measuring Your System's Performance
To ensure your Text-to-SQL system is reliable, implement these evaluation metrics:
- Execution Accuracy: Does the generated SQL execute without errors?
- Result Correctness: Do the results match expected outputs?
- Schema Relevance: For RAG systems, are the correct tables being retrieved?
- Latency: How fast is the end-to-end pipeline?
test_queries = [
{
"natural": "Show all employees with salary above 80000",
"expected_sql": "SELECT * FROM employees WHERE salary > 80000",
"expected_count": 3
},
{
"natural": "Which department has the highest average salary?",
"expected_sql": "SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name ORDER BY avg_salary DESC LIMIT 1",
"expected_count": 1
},
# Add more test cases...
]
Next Steps and Production Considerations
Once your basic system is working, consider these enhancements:
- Security: Implement query validation to prevent SQL injection or destructive operations (DROP, DELETE, UPDATE)
- Caching: Cache frequent queries to reduce API costs
- User feedback loop: Allow users to flag incorrect results for continuous improvement
- Multi-dialect support: Extend your system to support PostgreSQL, MySQL, or BigQuery syntax
- Streaming results: For large datasets, implement pagination or streaming responses
Key Takeaways
- Start simple, iterate fast: Begin with basic prompting, then add few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
- RAG is essential for real-world schemas: When databases have hundreds of tables, dynamic schema retrieval keeps prompts efficient and accurate.
- Self-improvement loops catch errors automatically: Claude can analyze its own SQL output, identify issues, and fix them without human intervention.
- Include data samples in prompts: A few rows of real data help Claude understand data distribution and improve query accuracy.
- Evaluate rigorously: Build a test suite with diverse queries to measure execution accuracy, result correctness, and latency before deploying to production.