Building a Robust Text-to-SQL System with Claude: From Basic Prompts to Self-Improving Queries
Learn how to use Claude to convert natural language into SQL queries. This guide covers prompting, chain-of-thought, RAG for complex schemas, and self-improvement loops.
This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompts, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude
Text-to-SQL is one of the most practical applications of large language models. It allows users to query databases using natural language, making data accessible to non-technical team members and speeding up workflows for analysts. Claude excels at this task thanks to its strong reasoning capabilities and ability to handle complex instructions.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt, then progressively improve it with few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and finally a self-improvement loop that lets Claude fix its own mistakes.
Why Text-to-SQL Matters
Before diving into the code, let's understand the value:
- Accessibility: Business users can ask questions like "Show me sales by region last quarter" without knowing SQL.
- Efficiency: Data analysts can prototype queries faster by describing what they want.
- Integration: Chatbots and internal tools can offer natural language database interfaces.
- Complexity: Claude can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write.
Setting Up Your Environment
First, install the required packages and set up a test SQLite database. We'll use two tables: employees and departments.
import sqlite3
import os
from anthropic import Anthropic
client = Anthropic(api_key="YOUR_API_KEY")
Create test database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
""")
cursor.execute("""
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date TEXT,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
""")
Insert sample data
cursor.executemany("INSERT INTO departments VALUES (?, ?, ?)", [
(1, 'Engineering', 'New York'),
(2, 'Sales', 'San Francisco'),
(3, 'Marketing', 'Chicago')
])
cursor.executemany("INSERT INTO employees VALUES (?, ?, ?, ?, ?)", [
(1, 'Alice', 1, 120000, '2020-01-15'),
(2, 'Bob', 2, 90000, '2021-03-20'),
(3, 'Charlie', 1, 110000, '2019-11-01'),
(4, 'Diana', 3, 85000, '2022-06-10')
])
conn.commit()
Step 1: The Basic Prompt
A good Text-to-SQL prompt needs three things: clear instructions, the user's query, and the database schema. Here's a minimal implementation:
def get_schema(cursor):
"""Extract schema from SQLite database."""
schema = []
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
schema.append(f"Table: {table_name}")
for col in columns:
schema.append(f" - {col[1]} ({col[2]})")
return "\n".join(schema)
def basic_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the following natural language query into a SQL query for the given database schema.
Schema:
{schema}
User Query: {user_query}
SQL Query:"""
Test it
schema = get_schema(cursor)
query = "Show me all employees in the Engineering department"
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": basic_prompt(query, schema)}]
)
print(response.content[0].text)
This works, but it's fragile. Complex queries or ambiguous wording can lead to errors.
Step 2: Improving with Few-Shot Examples
Adding examples (few-shot learning) dramatically improves accuracy. The model learns the expected output format and reasoning pattern.
def few_shot_prompt(user_query, schema):
examples = """
Example 1:
User Query: List all departments in Chicago
SQL: SELECT * FROM departments WHERE location = 'Chicago';
Example 2:
User Query: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
Example 3:
User Query: Who was hired most recently?
SQL: SELECT name, hire_date FROM employees ORDER BY hire_date DESC LIMIT 1;
"""
return f"""You are a SQL expert. Convert natural language queries to SQL.
Schema:
{schema}
{examples}
User Query: {user_query}
SQL:"""
Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about values and formats, reducing hallucination.
Step 3: Chain-of-Thought for Complex Queries
For multi-step queries, chain-of-thought prompting helps Claude reason before generating SQL. Use XML tags to structure the thinking process:
def cot_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the user's query into SQL.
Schema:
{schema}
User Query: {user_query}
First, think step-by-step inside <thinking> tags:
<thinking>
- Identify the tables needed
- Identify the columns needed
- Determine joins, filters, and aggregations
- Write the SQL query
</thinking>
Then output the SQL query inside <sql> tags:
<sql>
-- Your SQL here
</sql>"""
This approach reduces errors on queries involving multiple joins, subqueries, or complex WHERE clauses.
Step 4: RAG for Large Schemas
When your database has dozens or hundreds of tables, you can't fit the entire schema in a prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema parts.
Here's a simplified implementation using VoyageAI embeddings:
import voyageai
vo = voyageai.Client(api_key="YOUR_VOYAGE_API_KEY")
def get_relevant_schema(user_query, all_tables, top_k=3):
"""Retrieve only the most relevant table schemas."""
# Create embeddings for table descriptions
table_descriptions = [f"{t['name']}: {', '.join(t['columns'])}" for t in all_tables]
query_embedding = vo.embed([user_query], model="voyage-2").embeddings[0]
# Simple cosine similarity (use a proper vector DB in production)
similarities = []
for i, desc in enumerate(table_descriptions):
desc_embedding = vo.embed([desc], model="voyage-2").embeddings[0]
sim = cosine_similarity(query_embedding, desc_embedding)
similarities.append((sim, all_tables[i]))
# Return top-k most relevant tables
similarities.sort(reverse=True)
return [t for _, t in similarities[:top_k]]
Then modify your prompt to include only the retrieved schema:
def rag_prompt(user_query, relevant_tables):
schema_str = "\n".join([
f"Table: {t['name']}\n" + "\n".join([f" - {c}" for c in t['columns']])
for t in relevant_tables
])
return f"""Schema (relevant tables only):
{schema_str}
User Query: {user_query}
SQL:"""
Step 5: Self-Improvement Loop
The most powerful technique: let Claude execute its SQL, check for errors, and fix them automatically.
def self_improving_query(user_query, schema, cursor, max_iterations=3):
"""Generate SQL, execute it, and improve if errors occur."""
prompt = f"""Generate SQL for: {user_query}
Schema:
{schema}
SQL:"""
for i in range(max_iterations):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=300,
messages=[{"role": "user", "content": prompt}]
)
sql = response.content[0].text.strip()
try:
cursor.execute(sql)
results = cursor.fetchall()
return {"success": True, "sql": sql, "results": results}
except Exception as e:
error_msg = str(e)
prompt = f"""The previous SQL query failed with error: {error_msg}
Original query: {user_query}
Schema:
{schema}
Please fix the SQL query:"""
return {"success": False, "error": "Max iterations reached"}
This loop handles:
- Syntax errors (missing commas, wrong keywords)
- Schema mismatches (wrong table/column names)
- Logic errors (wrong joins, missing GROUP BY)
Evaluation Framework
To ensure your system works reliably, build a test suite:
test_cases = [
{"query": "List all employees", "expected_tables": ["employees"]},
{"query": "Average salary by department", "expected_columns": ["avg", "salary", "department"]},
{"query": "Employees hired in 2021", "expected_condition": "2021"},
]
def evaluate_sql(sql, test_case):
"""Check if generated SQL meets expected criteria."""
sql_lower = sql.lower()
checks = []
if "expected_tables" in test_case:
for table in test_case["expected_tables"]:
checks.append(table in sql_lower)
if "expected_columns" in test_case:
for col in test_case["expected_columns"]:
checks.append(col in sql_lower)
return all(checks)
Key Takeaways
- Start simple, iterate fast: Begin with a basic prompt, then add few-shot examples, chain-of-thought, and finally RAG as needed.
- Include schema and sample data: Claude performs better when it understands both structure and content.
- Use self-improvement loops: Let Claude execute and fix its own SQL. This catches most errors automatically.
- RAG is essential for production: For databases with more than 10-15 tables, use retrieval to keep prompts focused and within token limits.
- Always evaluate: Build a test suite of common queries to catch regressions when you change prompts or models.