Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.
This guide teaches you to build a Text-to-SQL system using Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops that let Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Text-to-SQL is one of the most impactful applications of large language models in enterprise settings. It bridges the gap between non-technical stakeholders and the data they need, allowing anyone to query databases using natural language instead of writing complex SQL statements.
Claude excels at this task because of its strong reasoning capabilities, large context window, and ability to understand nuanced instructions. In this guide, you'll learn how to build a production-ready Text-to-SQL system step by step—from basic prompting to advanced techniques like RAG and self-improvement loops.
Why Text-to-SQL Matters
Before diving into the implementation, let's understand why this capability is so valuable:
- Democratizing Data Access: Business analysts, product managers, and executives can query databases without waiting for engineering teams or learning SQL syntax.
- Accelerating Analytics: Data professionals can prototype complex queries in seconds rather than minutes or hours.
- Enabling Conversational Interfaces: Chatbots and internal tools can provide natural language database access.
- Handling Complexity: LLMs can generate queries with multiple joins, subqueries, window functions, and aggregations that would take humans significant time to write correctly.
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will serve as our playground throughout the guide.
import sqlite3
Create an in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT NOT NULL
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date TEXT,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2022-01-10'),
(4, 'Diana Prince', 3, 88000, '2019-11-20'),
(5, 'Eve Davis', 2, 65000, '2023-04-05')
])
conn.commit()
Step 1: Creating a Basic Text-to-SQL Prompt
The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt should include:
- Clear instructions about the task
- The user's natural language query
- The database schema so Claude can map concepts to tables and columns
def get_schema_string(cursor):
"""Extract schema information from the database."""
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
schema_parts = []
for (table_name,) in tables:
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
col_strs = [f" - {col[1]} ({col[2]})" for col in columns]
schema_parts.append(f"Table: {table_name}\n" + "\n".join(col_strs))
return "\n\n".join(schema_parts)
def basic_prompt(query, schema):
return f"""You are a SQL expert. Given the following database schema and a user query, generate the correct SQL query.
Database Schema:
{schema}
User Query: {query}
Generate only the SQL query, no explanation."""
schema = get_schema_string(cursor)
prompt = basic_prompt("Show me all employees in Engineering", schema)
print(prompt)
This basic approach works, but it has limitations. Let's improve it.
Step 2: Improving with Few-Shot Examples
Including examples in your prompt—a technique called few-shot learning—dramatically improves accuracy. It shows Claude exactly what format and level of detail you expect.
def few_shot_prompt(query, schema):
examples = """
Example 1:
User Query: List all departments in New York
SQL: SELECT * FROM departments WHERE location = 'New York';
Example 2:
User Query: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
Example 3:
User Query: Find employees hired after 2021 with salary over 80000
SQL: SELECT name, salary, hire_date FROM employees
WHERE hire_date > '2021-01-01' AND salary > 80000;
"""
return f"""You are a SQL expert. Given the database schema and user query, generate the correct SQL.
Database Schema:
{schema}
{examples}
User Query: {query}
SQL:"""
Pro Tip: Consider including a few rows of actual data in your prompt. This gives Claude context about data formats and values, which helps with queries involving specific strings or numbers.
Step 3: Chain-of-Thought Prompting for Complex Queries
For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step by step. Using XML tags to structure the reasoning process is particularly effective with Claude.
def cot_prompt(query, schema):
return f"""You are a SQL expert. Given the database schema and user query, generate the correct SQL.
Database Schema:
{schema}
First, reason through the query step by step inside <reasoning> tags. Then output the final SQL inside <sql> tags.
User Query: {query}
<reasoning>
"""
When you call Claude with this prompt, it will:
- Identify which tables are needed
- Determine the join conditions
- Figure out filtering criteria
- Decide on aggregations and groupings
- Output the final SQL
Step 4: Implementing RAG for Large Database Schemas
Real-world databases often have hundreds of tables. Including the entire schema in every prompt is wasteful and can confuse Claude. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.
First, create a vector database to store schema information:
import voyageai
import numpy as np
class SchemaVectorDB:
def __init__(self, api_key):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.metadata = []
def add_table(self, table_name, columns, description=""):
text = f"Table: {table_name}. Columns: {', '.join(columns)}. {description}"
embedding = self.client.embed([text], model="voyage-2").embeddings[0]
self.embeddings.append(embedding)
self.metadata.append({
"table_name": table_name,
"columns": columns,
"description": description
})
def query(self, user_query, top_k=3):
query_emb = self.client.embed([user_query], model="voyage-2").embeddings[0]
similarities = [
np.dot(query_emb, emb) / (np.linalg.norm(query_emb) * np.linalg.norm(emb))
for emb in self.embeddings
]
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [self.metadata[i] for i in top_indices]
Now update your prompt generation to use RAG:
def rag_prompt(query, vector_db):
# Retrieve relevant schema
relevant_tables = vector_db.query(query, top_k=2)
schema_str = "\n\n".join([
f"Table: {t['table_name']}\nColumns: {', '.join(t['columns'])}"
for t in relevant_tables
])
return f"""You are a SQL expert. Here are the relevant database tables:
{schema_str}
User Query: {query}
Generate the SQL query."""
Step 5: Building a Self-Improvement Loop
One of the most powerful techniques is giving Claude the ability to execute its own SQL, analyze the results, and fix errors. This creates a self-improvement loop that dramatically increases reliability.
def self_improving_query(claude_client, user_query, schema, conn, max_attempts=3):
"""
Generate SQL, execute it, and let Claude fix errors.
"""
prompt = f"""Generate SQL for: {user_query}
Schema:
{schema}
Output only the SQL query."""
for attempt in range(max_attempts):
# Get SQL from Claude
response = claude_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
sql_query = response.content[0].text.strip()
try:
cursor = conn.cursor()
cursor.execute(sql_query)
results = cursor.fetchall()
# If successful, return results
if results:
return {"success": True, "sql": sql_query, "results": results}
else:
# Empty results might be valid, but let Claude verify
verification_prompt = f"""The query '{sql_query}' returned no results for '{user_query}'. Is this expected? If not, fix the query."""
prompt = verification_prompt
except Exception as e:
error_msg = str(e)
# Let Claude fix the error
prompt = f"""The SQL query '{sql_query}' failed with error: {error_msg}
User query: {user_query}
Schema:
{schema}
Please fix the SQL query."""
return {"success": False, "error": "Max attempts reached"}
This approach handles:
- Syntax errors: Claude can fix typos and syntax issues
- Logic errors: If results don't match expectations, Claude can revise
- Schema misunderstandings: Claude can correct wrong table/column references
Evaluation: Measuring Your System's Performance
To build a reliable system, you need to measure its performance. Create a test set of query-SQL pairs and track:
- Execution Accuracy: Does the generated SQL execute without errors?
- Semantic Accuracy: Does the SQL produce the correct results?
- Exact Match: Does the SQL match the expected query exactly?
def evaluate_system(test_cases, claude_client, schema, conn):
results = {
"total": len(test_cases),
"execution_success": 0,
"semantic_match": 0
}
for user_query, expected_sql in test_cases:
response = self_improving_query(claude_client, user_query, schema, conn)
if response["success"]:
results["execution_success"] += 1
# Compare results (not SQL text) for semantic accuracy
expected_cursor = conn.cursor()
expected_cursor.execute(expected_sql)
expected_results = set(expected_cursor.fetchall())
actual_results = set(response["results"])
if expected_results == actual_results:
results["semantic_match"] += 1
return results
Key Takeaways
- Start simple, iterate fast: Begin with basic prompting, then layer in few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
- Include schema context: Always provide Claude with the database schema. For large databases, use RAG to dynamically retrieve only relevant schema information.
- Leverage chain-of-thought: For complex queries, have Claude reason step-by-step inside XML tags before generating SQL. This reduces errors significantly.
- Build self-improvement loops: Let Claude execute its SQL, analyze errors, and fix them. This dramatically improves reliability in production.
- Measure what matters: Track execution accuracy and semantic accuracy separately. A query that runs but returns wrong results is worse than one that fails.