Build a Text-to-SQL System with Claude: From Basic Prompts to Advanced RAG
Learn how to transform natural language questions into accurate SQL queries using Claude AI. This practical guide covers prompting techniques, RAG implementation, and self-improvement loops for production-ready systems.
This guide teaches you to build a robust Text-to-SQL system with Claude. You'll learn effective prompting strategies, implement RAG for complex schemas, and create self-improving query loops that handle errors and refine outputs automatically.
Build a Text-to-SQL System with Claude: From Basic Prompts to Advanced RAG
Text-to-SQL is a transformative capability that lets users query databases using natural language instead of SQL syntax. With Claude's advanced reasoning abilities, you can build production-ready systems that accurately translate human questions into structured database queries. This guide walks through practical implementation techniques, from basic prompts to sophisticated Retrieval Augmented Generation (RAG) approaches.
Why Text-to-SQL Matters for Your Organization
Implementing Text-to-SQL with Claude delivers immediate business value:
- Democratize Data Access: Enable non-technical team members to extract insights without SQL knowledge
- Accelerate Development: Data analysts can prototype complex queries in seconds instead of minutes
- Create Intuitive Interfaces: Build natural language interfaces for internal tools and customer-facing applications
- Handle Complexity: Claude can generate sophisticated queries involving multiple joins, subqueries, and aggregations that might challenge junior developers
Setting Up Your Development Environment
Before diving into prompting strategies, let's establish a test environment. We'll use SQLite for simplicity, but these techniques apply to any SQL database.
import sqlite3
import pandas as pd
Create a test database with sample tables
def create_test_database():
conn = sqlite3.connect('company.db')
cursor = conn.cursor()
# Create departments table
cursor.execute('''CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
budget REAL
)''')
# Create employees table
cursor.execute('''CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department_id INTEGER,
salary REAL,
hire_date TEXT,
FOREIGN KEY (department_id) REFERENCES departments(id)
)''')
# Insert sample data
cursor.execute("INSERT INTO departments (name, budget) VALUES ('Engineering', 1000000)")
cursor.execute("INSERT INTO departments (name, budget) VALUES ('Sales', 800000)")
cursor.execute("INSERT INTO departments (name, budget) VALUES ('Marketing', 600000)")
cursor.execute("INSERT INTO employees (name, department_id, salary, hire_date) VALUES ('Alice Smith', 1, 95000, '2022-03-15')")
cursor.execute("INSERT INTO employees (name, department_id, salary, hire_date) VALUES ('Bob Johnson', 1, 110000, '2021-07-22')")
cursor.execute("INSERT INTO employees (name, department_id, salary, hire_date) VALUES ('Carol Davis', 2, 85000, '2023-01-10')")
conn.commit()
conn.close()
return 'company.db'
Create the database
db_path = create_test_database()
Creating Your First Text-to-SQL Prompt
A basic Text-to-SQL prompt needs three key elements: clear instructions, the user's question, and the database schema. Here's a foundational approach:
def get_schema_info(db_path):
"""Extract schema information from the database"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Get table information
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
schema_info = ""
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
schema_info += f"Table: {table_name}\n"
for col in columns:
schema_info += f" - {col[1]} ({col[2]})\n"
schema_info += "\n"
conn.close()
return schema_info
def generate_basic_prompt(user_query, schema_info):
"""Create a basic Text-to-SQL prompt"""
prompt = f"""You are an expert SQL developer. Convert the following natural language question into a SQL query.
Database Schema:
{schema_info}
Question: {user_query}
Provide only the SQL query without any additional explanation."""
return prompt
Example usage
schema = get_schema_info(db_path)
user_question = "What is the average salary in the Engineering department?"
basic_prompt = generate_basic_prompt(user_question, schema)
When you send this prompt to Claude via the API, you'll receive a SQL query like:
SELECT AVG(salary) FROM employees WHERE department_id = (SELECT id FROM departments WHERE name = 'Engineering');
Enhancing Accuracy with Few-Shot Examples
Few-shot learning dramatically improves Claude's performance by providing concrete examples. Here's how to enhance your prompt:
def generate_few_shot_prompt(user_query, schema_info):
"""Create a prompt with examples for better accuracy"""
examples = """
Examples:
- Question: How many employees are in each department?
SQL: SELECT d.name, COUNT(e.id) FROM departments d LEFT JOIN employees e ON d.id = e.department_id GROUP BY d.name
- Question: Who is the highest paid employee in Engineering?
SQL: SELECT name, salary FROM employees WHERE department_id = (SELECT id FROM departments WHERE name = 'Engineering') ORDER BY salary DESC LIMIT 1
- Question: What is the total budget for all departments?
SQL: SELECT SUM(budget) FROM departments
"""
prompt = f"""You are an expert SQL developer. Convert the following natural language question into a SQL query.
Database Schema:
{schema_info}
{examples}
Question: {user_query}
Provide only the SQL query without any additional explanation."""
return prompt
Pro Tip: Include actual data samples in your prompt when possible. Adding a few rows of real data helps Claude understand data types and content patterns:
# Add this to your schema extraction function
def get_sample_data(db_path, table_name, limit=3):
conn = sqlite3.connect(db_path)
query = f"SELECT * FROM {table_name} LIMIT {limit}"
df = pd.read_sql_query(query, conn)
conn.close()
return df.to_string()
Implementing Chain-of-Thought Reasoning
For complex queries, chain-of-thought prompting helps Claude break down the problem systematically. Using XML tags structures the reasoning process:
def generate_cot_prompt(user_query, schema_info):
"""Create a chain-of-thought prompt with XML tags"""
prompt = f"""<task>
Convert the following natural language question into a SQL query.
</task>
<database_schema>
{schema_info}
</database_schema>
<question>
{user_query}
</question>
<thinking>
Let me break this down step by step:
- First, I need to understand what information is being requested...
- Next, I'll identify which tables and columns are needed...
- Then, I'll determine any joins, filters, or aggregations required...
- Finally, I'll construct the SQL query...
</thinking>
<sql_query>
"""
return prompt
When Claude processes this prompt, it will fill in the <thinking> section with its reasoning before providing the SQL query. This approach is particularly valuable for:
- Multi-table joins with complex relationships
- Nested subqueries and CTEs (Common Table Expressions)
- Aggregations with grouping and filtering
- Date calculations and window functions
Scaling with RAG for Complex Database Schemas
For enterprise databases with hundreds of tables, including the entire schema in every prompt is impractical. Retrieval Augmented Generation (RAG) dynamically retrieves only relevant schema information. Here's a simplified implementation:
import voyageai
from typing import List, Dict
class SchemaVectorDB:
"""Simple vector database for schema components"""
def __init__(self, api_key):
self.client = voyageai.Client(api_key=api_key)
self.schema_chunks = []
self.embeddings = []
def add_schema_component(self, component: str, metadata: Dict):
"""Add a schema component (table, view, etc.) to the vector DB"""
self.schema_chunks.append({
'text': component,
'metadata': metadata
})
def build_embeddings(self):
"""Generate embeddings for all schema components"""
texts = [chunk['text'] for chunk in self.schema_chunks]
results = self.client.embed(texts, model="voyage-2")
self.embeddings = results.embeddings
def retrieve_relevant_schema(self, query: str, top_k: int = 5) -> str:
"""Retrieve the most relevant schema components for a query"""
query_embedding = self.client.embed([query], model="voyage-2").embeddings[0]
# Calculate cosine similarity (simplified)
similarities = []
for i, embedding in enumerate(self.embeddings):
# Simple dot product for demonstration
similarity = sum(a*b for a,b in zip(query_embedding, embedding))
similarities.append((i, similarity))
# Get top-k most similar
similarities.sort(key=lambda x: x[1], reverse=True)
top_indices = [idx for idx, _ in similarities[:top_k]]
# Combine relevant schema information
relevant_schema = ""
for idx in top_indices:
relevant_schema += self.schema_chunks[idx]['text'] + "\n\n"
return relevant_schema
Usage example
def generate_rag_prompt(user_query, vector_db):
"""Create a prompt using RAG-retrieved schema information"""
relevant_schema = vector_db.retrieve_relevant_schema(user_query)
prompt = f"""You are an expert SQL developer. Convert the following natural language question into a SQL query.
Relevant Database Schema:
{relevant_schema}
Question: {user_query}
Provide only the SQL query without any additional explanation."""
return prompt
This RAG approach ensures Claude receives only the schema components most relevant to the current query, reducing token usage and improving focus.
Building Self-Improving Query Loops
A production Text-to-SQL system should validate and improve its own outputs. Here's a self-improvement loop that executes queries, analyzes results, and refines them:
def execute_and_improve_query(db_path, initial_sql, user_query, max_attempts=3):
"""Execute SQL and improve it if there are errors"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
attempts = 0
current_sql = initial_sql
while attempts < max_attempts:
try:
cursor.execute(current_sql)
results = cursor.fetchall()
# Check if results make sense for the query
if validate_results(results, user_query):
conn.close()
return {
'success': True,
'sql': current_sql,
'results': results,
'attempts': attempts + 1
}
else:
# Results don't match expected pattern, try to improve
improvement_prompt = create_improvement_prompt(
user_query, current_sql, results, "Results don't match expected pattern"
)
current_sql = call_claude_for_improvement(improvement_prompt)
except sqlite3.Error as e:
# SQL error occurred, try to fix it
error_msg = str(e)
improvement_prompt = create_improvement_prompt(
user_query, current_sql, None, error_msg
)
current_sql = call_claude_for_improvement(improvement_prompt)
attempts += 1
conn.close()
return {
'success': False,
'sql': current_sql,
'error': 'Max improvement attempts reached',
'attempts': attempts
}
def create_improvement_prompt(user_query, failed_sql, results, error_info):
"""Create a prompt for Claude to improve a failed query"""
prompt = f"""<task>
Improve the following SQL query that failed or produced unexpected results.
</task>
<original_question>
{user_query}
</original_question>
<failed_query>
{failed_sql}
</failed_query>
<issue>
{error_info}
</issue>
{if results}
<results>
{results}
</results>
{endif}
<thinking>
Let me analyze what went wrong:
- The error/issue indicates...
- Looking at the original question, I should have...
- The corrected approach would be...
</thinking>
<improved_sql_query>
"""
return prompt
This self-improvement loop handles:
- Syntax errors: Invalid SQL syntax
- Semantic errors: Queries that run but don't answer the question correctly
- Performance issues: Queries that could be optimized
- Edge cases: Null handling, data type conversions, and boundary conditions
Evaluation and Testing Strategies
Before deploying your Text-to-SQL system, establish evaluation metrics:
def evaluate_sql_generation(test_cases, db_path):
"""Evaluate Claude's SQL generation accuracy"""
results = {
'syntax_correct': 0,
'semantically_correct': 0,
'total': len(test_cases)
}
for question, expected_sql in test_cases.items():
# Generate SQL with Claude
generated_sql = generate_sql_with_claude(question, db_path)
# Check syntax
if validate_sql_syntax(generated_sql, db_path):
results['syntax_correct'] += 1
# Check semantic correctness (simplified)
if compare_query_results(generated_sql, expected_sql, db_path):
results['semantically_correct'] += 1
# Calculate accuracy percentages
results['syntax_accuracy'] = results['syntax_correct'] / results['total'] * 100
results['semantic_accuracy'] = results['semantically_correct'] / results['total'] * 100
return results
Create test cases covering:
- Basic queries: Simple SELECT statements
- Joins: Multiple table relationships
- Aggregations: GROUP BY, HAVING clauses
- Subqueries: Nested and correlated subqueries
- Complex filters: Date ranges, pattern matching, conditional logic
Key Takeaways
- Start with structured prompts: Include clear instructions, the database schema, and the user's question in your initial implementation. Few-shot examples dramatically improve accuracy.
- Implement RAG for scalability: For databases with many tables, use vector embeddings to retrieve only relevant schema information rather than including everything in each prompt.
- Build self-improvement loops: Create systems that execute generated SQL, catch errors, and automatically refine queries. This reduces manual intervention and improves reliability.
- Validate with comprehensive testing: Establish evaluation metrics and test cases covering various query types before deploying to production.
- Consider data sampling: Including sample rows from tables in your prompts helps Claude understand data patterns and content, leading to more accurate query generation.