Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG
Learn how to build a production-ready Text-to-SQL system using Claude. Covers basic prompting, chain-of-thought, RAG for large schemas, and self-improvement loops.
This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for complex schemas, and self-improvement loops that let Claude fix its own SQL errors.
Building a Robust Text-to-SQL System with Claude: From Basic Prompts to RAG
Text-to-SQL is one of the most transformative applications of large language models. It bridges the gap between natural language and structured data, allowing anyone in an organization to query databases without knowing SQL syntax. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.
In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt, then progressively improve it with few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and finally a self-improvement loop that lets Claude fix its own mistakes.
Why Text-to-SQL Matters
Before diving into the implementation, let's understand why Text-to-SQL is so valuable:
- Accessibility: Non-technical team members can query databases without learning SQL
- Efficiency: Data analysts can prototype queries in seconds instead of minutes
- Integration: Enables natural language interfaces for dashboards, chatbots, and internal tools
- Complexity: LLMs can generate multi-join queries, subqueries, and aggregations that would take humans significant time to write
Setting Up Your Environment
First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.
import sqlite3
Create an in-memory database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
Create departments table
cursor.execute('''
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
location TEXT
)
''')
Create employees table
cursor.execute('''
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(id)
)
''')
Insert sample data
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [
(1, 'Engineering', 'New York'),
(2, 'Marketing', 'San Francisco'),
(3, 'Sales', 'Chicago')
])
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [
(1, 'Alice Johnson', 1, 95000, '2020-03-15'),
(2, 'Bob Smith', 1, 85000, '2021-06-01'),
(3, 'Charlie Brown', 2, 72000, '2019-11-20'),
(4, 'Diana Prince', 3, 68000, '2022-01-10'),
(5, 'Eve Wilson', 2, 78000, '2020-09-05')
])
conn.commit()
Step 1: Creating a Basic Text-to-SQL Prompt
The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt must include:
- Clear instructions for the task
- The user's natural language query
- The database schema so Claude knows what tables and columns are available
def get_schema_string():
"""Extract schema from the database as a string."""
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'")
schemas = cursor.fetchall()
return '\n'.join([schema[0] for schema in schemas])
def basic_prompt(user_query: str) -> str:
schema = get_schema_string()
return f"""You are a SQL expert. Convert the following natural language query into a SQL query.
Database Schema:
{schema}
User Query: {user_query}
Generate only the SQL query, no explanation."""
Test it
query = "Show me all employees in the Engineering department"
prompt = basic_prompt(query)
print(prompt)
Now let's call the Claude API to generate SQL:
import anthropic
client = anthropic.Anthropic()
def generate_sql(prompt: str) -> str:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
sql = generate_sql(prompt)
print(f"Generated SQL: {sql}")
Execute the SQL to verify
cursor.execute(sql)
results = cursor.fetchall()
print(f"Results: {results}")
Step 2: Improving with Few-Shot Examples
A basic prompt works, but you'll get better results by including examples. This technique, called few-shot learning, gives Claude concrete input-output pairs to learn from.
def few_shot_prompt(user_query: str) -> str:
schema = get_schema_string()
examples = """
Example 1:
User Query: List all departments
SQL: SELECT * FROM departments;
Example 2:
User Query: What is the average salary by department?
SQL: SELECT d.name, AVG(e.salary) as avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name;
Example 3:
User Query: Find employees hired after 2020
SQL: SELECT * FROM employees WHERE hire_date > '2020-12-31';
"""
return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
{examples}
User Query: {user_query}
Generate only the SQL query:"""
Pro tip: Including a few rows of actual data in the prompt can further improve accuracy by giving Claude context about the data's structure and content.
Step 3: Chain-of-Thought Prompting
For complex queries, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating SQL. This is especially useful for queries involving multiple joins, subqueries, or aggregations.
def cot_prompt(user_query: str) -> str:
schema = get_schema_string()
return f"""You are a SQL expert. Convert natural language queries into SQL.
Database Schema:
{schema}
First, think step-by-step about how to answer the query:
- Identify the tables needed
- Determine the columns to select
- Identify join conditions
- Apply filters and aggregations
- Order and limit if needed
Then, generate the SQL query.
User Query: {user_query}
<thinking>
[Your step-by-step reasoning]
</thinking>
<sql>
[Your SQL query]
</sql>"""
The XML tags (<thinking> and <sql>) help structure Claude's output, making it easier to parse the SQL from the reasoning.
Step 4: Implementing RAG for Complex Schemas
As databases grow to hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.
First, let's build a simple vector database using VoyageAI embeddings:
import voyageai
import numpy as np
from typing import List, Dict
class SimpleVectorDB:
def __init__(self):
self.vo = voyageai.Client()
self.embeddings = []
self.metadata = []
def add_documents(self, documents: List[str], metadata: List[Dict]):
batch_embeddings = self.vo.embed(
documents,
model="voyage-2"
).embeddings
self.embeddings.extend(batch_embeddings)
self.metadata.extend(metadata)
def query(self, query: str, top_k: int = 5) -> List[Dict]:
query_embedding = self.vo.embed([query], model="voyage-2").embeddings[0]
scores = [
np.dot(query_embedding, emb)
for emb in self.embeddings
]
top_indices = np.argsort(scores)[-top_k:][::-1]
return [self.metadata[i] for i in top_indices]
Index each table's schema
vectordb = SimpleVectorDB()
Get all table schemas
cursor.execute("SELECT name, sql FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
for name, schema in tables:
vectordb.add_documents(
documents=[schema],
metadata=[{"table_name": name, "schema": schema}]
)
Now update your prompt generation to use RAG:
def rag_prompt(user_query: str) -> str:
# Retrieve relevant schema
relevant_tables = vectordb.query(user_query, top_k=2)
schema_context = '\n'.join([t['schema'] for t in relevant_tables])
return f"""You are a SQL expert. Convert natural language queries into SQL.
Relevant Database Schema:
{schema_context}
User Query: {user_query}
Generate only the SQL query:"""
This approach scales to hundreds of tables while keeping token usage low and relevance high.
Step 5: Implementing Query Self-Improvement
One of Claude's most powerful capabilities is self-correction. We can build a loop where Claude:
- Generates SQL
- Executes it
- Analyzes the results or errors
- Improves the query if needed
def self_improving_sql(user_query: str, max_iterations: int = 3) -> str:
prompt = cot_prompt(user_query)
for i in range(max_iterations):
sql = generate_sql(prompt)
try:
cursor.execute(sql)
results = cursor.fetchall()
# Check if results make sense
validation_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql}
Results: {results}
Does this SQL correctly answer the original query?
If yes, say 'CORRECT'. If no, explain what's wrong.
"""
validation = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": validation_prompt}]
).content[0].text
if 'CORRECT' in validation:
return sql
# If incorrect, feed the error back
prompt = f"""
Previous attempt was incorrect.
Original Query: {user_query}
Previous SQL: {sql}
Feedback: {validation}
Please generate a corrected SQL query:
"""
except Exception as e:
# Handle SQL errors
prompt = f"""
Previous SQL had an error.
Original Query: {user_query}
Previous SQL: {sql}
Error: {str(e)}
Please generate a corrected SQL query:
"""
return sql # Return last attempt
This self-improvement loop dramatically increases reliability, especially for complex queries or edge cases.
Evaluation Framework
To ensure your Text-to-SQL system is production-ready, implement a robust evaluation framework:
def evaluate_queries(test_cases: List[Dict]):
"""
test_cases: list of dicts with 'query' and 'expected_sql'
"""
results = []
for case in test_cases:
generated_sql = self_improving_sql(case['query'])
# Compare results, not just SQL text
cursor.execute(generated_sql)
generated_results = set(cursor.fetchall())
cursor.execute(case['expected_sql'])
expected_results = set(cursor.fetchall())
is_correct = generated_results == expected_results
results.append({
'query': case['query'],
'correct': is_correct,
'generated_sql': generated_sql
})
accuracy = sum(r['correct'] for r in results) / len(results)
return accuracy, results
Key Takeaways
- Start simple, then iterate: Begin with a basic prompt and schema, then add few-shot examples, chain-of-thought, and RAG as complexity grows
- RAG is essential for scale: For databases with dozens or hundreds of tables, dynamic schema retrieval keeps prompts focused and cost-effective
- Self-improvement loops boost reliability: Letting Claude execute, validate, and fix its own SQL dramatically reduces errors
- Always include the schema: Claude needs to know table names, columns, and relationships to generate accurate SQL
- Evaluate on results, not SQL text: Two different SQL queries can produce the same correct results; compare output sets for accuracy