Building a Robust Text-to-SQL System with Claude: A Practical Guide
Learn how to convert natural language into SQL queries using Claude. Covers prompting, RAG for complex schemas, chain-of-thought, and self-improvement loops.
This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: A Practical Guide
Text-to-SQL is one of the most practical applications of large language models in the enterprise. It allows non-technical users to query databases using plain English, and it helps data professionals prototype queries faster. Claude excels at this task because it understands context, interprets complex requests, and generates accurate SQL.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt, then layer in few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and finally a self-improvement loop that lets Claude fix its own mistakes.
Why Text-to-SQL Matters
Before diving into the code, let's look at why this capability is so valuable:
- Accessibility: Business analysts, product managers, and executives can query databases without knowing SQL syntax.
- Efficiency: Data scientists can quickly prototype complex queries involving joins, subqueries, and aggregations.
- Integration: Chatbots and internal tools can offer natural language interfaces to your data warehouse.
- Accuracy: With proper prompting and feedback loops, Claude can generate SQL that is often more reliable than human-written queries for complex joins.
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.
import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice', 1, 120000, '2020-01-15'),
(2, 'Bob', 1, 95000, '2021-03-20'),
(3, 'Charlie', 2, 80000, '2019-11-01'),
(4, 'Diana', 3, 110000, '2022-06-10'),
(5, 'Eve', 2, 75000, '2023-02-28')
])
conn.commit()
Creating a Basic Text-to-SQL Prompt
A good prompt needs three things: clear instructions, the user's query, and the database schema. Let's build a simple function that generates this prompt.
def get_schema(conn):
cursor = conn.cursor()
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
schemas = cursor.fetchall()
return '\n'.join([s[0] for s in schemas])
def basic_prompt(user_query, schema):
return f"""You are a SQL expert. Given the following database schema, convert the user's natural language query into a valid SQL query.
Database Schema:
{schema}
User Query: {user_query}
SQL Query:"""
Example usage
schema = get_schema(conn)
prompt = basic_prompt("Show me all employees in Engineering", schema)
print(prompt)
Now let's call the Claude API with this prompt:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
sql_query = response.content[0].text
print("Generated SQL:", sql_query)
Execute and verify
cursor.execute(sql_query)
results = cursor.fetchall()
print("Results:", results)
This basic approach works, but it can be inconsistent with complex queries. Let's improve it.
Improving the Prompt with Examples (Few-Shot Learning)
Including a few examples of natural language queries paired with their SQL equivalents helps Claude understand the expected output format and reasoning style.
def few_shot_prompt(user_query, schema):
examples = """
Examples:
User: List all departments
SQL: SELECT * FROM departments;
User: Show employees hired after 2021
SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';
User: What is the average salary per department?
SQL: SELECT d.name, AVG(e.salary) FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;
"""
return f"""You are a SQL expert. Given the database schema and examples, convert the user's query into SQL.
Database Schema:
{schema}
{examples}
User Query: {user_query}
SQL Query:"""
Pro tip: You can also include a few rows of actual data in the prompt. This gives Claude more context about the data types and values, which can improve accuracy for queries involving specific strings or numbers.
Using Chain-of-Thought Prompting
For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step by step. We'll use XML tags to structure the reasoning.
def cot_prompt(user_query, schema):
return f"""You are a SQL expert. Convert the user's query into SQL by reasoning step by step.
Database Schema:
{schema}
User Query: {user_query}
<thinking>
Let me break down the query:
- Identify the tables needed
- Determine the columns to select
- Identify any filters (WHERE clauses)
- Determine if joins are needed
- Check for aggregations (GROUP BY, HAVING)
- Determine the sort order
</thinking>
SQL Query:"""
This structured reasoning reduces errors in complex queries because Claude explicitly considers each component before generating the final SQL.
Implementing RAG for Complex Database Schemas
Real-world databases often have hundreds of tables. Including the entire schema in every prompt wastes tokens and confuses the model. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.
First, let's build a simple vector database using VoyageAI embeddings:
import voyageai
import numpy as np
class SimpleVectorDB:
def __init__(self, api_key):
self.client = voyageai.Client(api_key=api_key)
self.embeddings = []
self.texts = []
def add_documents(self, texts):
self.texts = texts
response = self.client.embed(texts, model="voyage-2")
self.embeddings = np.array(response.embeddings)
def query(self, query_text, top_k=3):
query_embedding = self.client.embed([query_text], model="voyage-2").embeddings[0]
similarities = np.dot(self.embeddings, query_embedding)
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [self.texts[i] for i in top_indices]
Index each table's schema separately
db = SimpleVectorDB(api_key="your-voyage-api-key")
db.add_documents([
"Table: employees (id, name, department_id, salary, hire_date)",
"Table: departments (id, name, location)"
])
Retrieve relevant schema for a query
user_query = "Show me the highest paid employee in each department"
relevant_schemas = db.query(user_query)
print("Relevant schema:", relevant_schemas)
Now update your prompt function to use only the retrieved schema:
def rag_prompt(user_query, vector_db):
relevant_schema = '\n'.join(vector_db.query(user_query))
return f"""You are a SQL expert. Use only the provided schema to answer the query.
Relevant Schema:
{relevant_schema}
User Query: {user_query}
SQL Query:"""
This approach scales to databases with hundreds of tables and keeps token usage low.
Implementing Query Self-Improvement
One of Claude's most powerful features is the ability to critique and improve its own outputs. We can build a self-improvement loop that:
- Generates an initial SQL query
- Executes it against the database
- If there's an error, feeds the error message back to Claude for correction
- Repeats until the query succeeds or a max retry limit is reached
def self_improving_query(user_query, schema, conn, max_retries=3):
prompt = f"""Convert this query to SQL:
Schema: {schema}
Query: {user_query}
SQL:"""
for attempt in range(max_retries):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
sql = response.content[0].text.strip()
try:
cursor = conn.cursor()
cursor.execute(sql)
return cursor.fetchall()
except Exception as e:
error_msg = str(e)
print(f"Attempt {attempt + 1} failed: {error_msg}")
# Feed the error back to Claude
prompt = f"""The previous SQL query failed with this error: {error_msg}
Original query: {user_query}
Schema: {schema}
Please fix the SQL:"""
raise Exception("Failed to generate valid SQL after max retries")
Example usage
results = self_improving_query("Show me employees and their department names", schema, conn)
print(results)
This loop is especially useful when dealing with complex schemas or ambiguous user queries. Claude can learn from its mistakes in real time.
Evaluating Your Text-to-SQL System
To ensure quality, you need a systematic evaluation framework. Here's a simple approach:
- Create a test set of 20-50 natural language queries with expected SQL outputs.
- Measure exact match between generated and expected SQL.
- Measure execution accuracy – does the generated SQL run without errors?
- Measure result accuracy – does the generated SQL return the correct data?
def evaluate(test_cases, conn, generate_fn):
correct = 0
for case in test_cases:
generated_sql = generate_fn(case["query"], get_schema(conn))
try:
cursor = conn.cursor()
cursor.execute(generated_sql)
generated_results = cursor.fetchall()
cursor.execute(case["expected_sql"])
expected_results = cursor.fetchall()
if generated_results == expected_results:
correct += 1
except:
pass
return correct / len(test_cases)
Key Takeaways
- Start simple, then layer complexity: Begin with a basic prompt, then add few-shot examples, chain-of-thought reasoning, and RAG as needed.
- Use RAG for large schemas: Dynamically retrieving relevant schema information keeps prompts focused and token-efficient.
- Implement a self-improvement loop: Let Claude execute its own SQL, catch errors, and fix them automatically. This dramatically improves reliability.
- Include data samples in prompts: A few rows of real data help Claude understand data types and values, leading to more accurate queries.
- Evaluate systematically: Build a test set and measure execution accuracy and result accuracy, not just SQL syntax matching.