BeClaude
GuideBeginnerBest Practices2026-05-12

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and query self-improvement.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude fix its own SQL errors.

Text-to-SQLPrompt EngineeringRAGClaude APIDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most powerful applications of large language models in enterprise settings. It bridges the gap between non-technical stakeholders and complex databases, allowing anyone to ask questions in plain English and get structured data back.

Claude excels at this task because of its strong reasoning capabilities, large context window, and ability to understand nuanced natural language. In this guide, you'll learn how to build a production-ready Text-to-SQL system step by step.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand why this capability is so valuable:

  • Democratizes data access: Business analysts, product managers, and executives can query databases without learning SQL syntax.
  • Boosts productivity: Data engineers and analysts can prototype complex queries in seconds rather than minutes.
  • Enables conversational interfaces: Chatbots and internal tools can offer natural language database interactions.
  • Handles complexity: Claude can generate queries with multiple JOINs, subqueries, aggregations, and window functions that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3

Create in-memory database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments(id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Sales', 'San Francisco'), (3, 'Marketing', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice Johnson', 1, 95000, '2020-03-15'), (2, 'Bob Smith', 1, 85000, '2021-06-01'), (3, 'Charlie Brown', 2, 72000, '2019-11-20'), (4, 'Diana Prince', 3, 88000, '2022-01-10') ])

conn.commit()

Step 1: Building a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt must include:

  • Clear instructions for Claude
  • The user's natural language query
  • The database schema so Claude understands the available tables and columns
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def get_schema(): cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'") schemas = cursor.fetchall() return "\n".join([s[0] for s in schemas])

def generate_basic_prompt(user_query, schema): return f"""You are a SQL expert. Convert the following natural language query into a SQL query.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, no explanation."""

Test it

schema = get_schema() prompt = generate_basic_prompt("Show me all employees in Engineering", schema)

response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=200, messages=[{"role": "user", "content": prompt}] )

print(response.content[0].text)

Output: SELECT * FROM employees WHERE department_id = 1;

This works, but it's basic. Let's make it smarter.

Step 2: Improving with Few-Shot Examples

Including examples in your prompt (few-shot learning) dramatically improves accuracy. Claude learns the pattern you expect from concrete input-output pairs.

def generate_few_shot_prompt(user_query, schema):
    examples = """
Example 1:
User Query: List all departments
SQL: SELECT * FROM departments;

Example 2: User Query: What is the average salary by department? SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name;

Example 3: User Query: Find employees hired after 2021 with salary over 80000 SQL: SELECT name, salary, hire_date FROM employees WHERE hire_date > '2021-01-01' AND salary > 80000; """ return f"""You are a SQL expert. Convert natural language to SQL.

Database Schema: {schema}

Here are some examples: {examples}

User Query: {user_query}

Generate only the SQL query:"""

Pro tip: Include a few rows of actual data in your prompt. This gives Claude context about data types and content patterns, leading to more accurate queries.

Step 3: Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step by step before generating SQL.

def generate_cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the following query to SQL.

Database Schema: {schema}

User Query: {user_query}

Let's think through this step by step:

<thinking>

  • What tables do I need?
  • What columns should I select?
  • What conditions (WHERE) are needed?
  • Do I need GROUP BY, ORDER BY, or JOINs?
  • What's the final query structure?
</thinking>

SQL:"""

This structured reasoning helps Claude avoid common mistakes like missing JOIN conditions or incorrect GROUP BY clauses.

Step 4: RAG for Large Database Schemas

When your database has dozens or hundreds of tables, you can't fit the entire schema in every prompt. Retrieval Augmented Generation (RAG) solves this by dynamically fetching only the relevant schema information.

import voyageai
from typing import List, Dict

class VectorDB: def __init__(self, api_key: str): self.client = voyageai.Client(api_key=api_key) self.embeddings = [] self.metadata = [] def add_documents(self, documents: List[Dict]): """Add documents with 'text' and 'metadata' keys""" texts = [doc['text'] for doc in documents] embeddings = self.client.embed(texts, model="voyage-2").embeddings self.embeddings.extend(embeddings) self.metadata.extend([doc['metadata'] for doc in documents]) def search(self, query: str, k: int = 5) -> List[Dict]: query_embedding = self.client.embed([query], model="voyage-2").embeddings[0] # Simple cosine similarity (use proper vector DB in production) similarities = [ self._cosine_similarity(query_embedding, emb) for emb in self.embeddings ] top_indices = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:k] return [self.metadata[i] for i in top_indices] def _cosine_similarity(self, a, b): dot_product = sum(x*y for x, y in zip(a, b)) norm_a = sum(xx for x in a) * 0.5 norm_b = sum(xx for x in b) * 0.5 return dot_product / (norm_a * norm_b)

Index your schema

vector_db = VectorDB(api_key="your-voyage-api-key")

table_descriptions = [ {"text": "employees table: id, name, department_id, salary, hire_date. Contains employee personal and compensation data.", "metadata": {"table": "employees", "columns": ["id", "name", "department_id", "salary", "hire_date"]}}, {"text": "departments table: id, name, location. Contains department information and office locations.", "metadata": {"table": "departments", "columns": ["id", "name", "location"]}} ]

vector_db.add_documents(table_descriptions)

def generate_rag_prompt(user_query): # Retrieve relevant schema relevant_tables = vector_db.search(user_query, k=2) # Build schema string from retrieved metadata schema_str = "\n".join([ f"Table: {t['table']}, Columns: {', '.join(t['columns'])}" for t in relevant_tables ]) return f"""You are a SQL expert. Convert the query to SQL.

Relevant Schema: {schema_str}

User Query: {user_query}

SQL:"""

Step 5: Self-Improvement Loop

One of Claude's most powerful capabilities is self-correction. You can build a loop where Claude:

  • Generates SQL
  • Executes it
  • Checks for errors or unexpected results
  • Fixes and regenerates if needed
def text_to_sql_with_self_improvement(user_query, max_iterations=3):
    schema = get_schema()
    
    for i in range(max_iterations):
        prompt = generate_few_shot_prompt(user_query, schema)
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )
        sql_query = response.content[0].text.strip()
        
        # Try to execute
        try:
            cursor.execute(sql_query)
            results = cursor.fetchall()
            
            # Check if results make sense
            validation_prompt = f"""
Original Query: {user_query}
Generated SQL: {sql_query}
Results: {results}

Does this SQL correctly answer the user's query? If not, provide the corrected SQL. If correct, say "CORRECT". """ validation = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=300, messages=[{"role": "user", "content": validation_prompt}] ) if "CORRECT" in validation.content[0].text: return sql_query, results else: # Use Claude's correction sql_query = validation.content[0].text except Exception as e: # Claude fixes its own error error_prompt = f""" The SQL query had an error: {e}

Original Query: {user_query} Failed SQL: {sql_query}

Provide the corrected SQL: """ correction = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=300, messages=[{"role": "user", "content": error_prompt}] ) sql_query = correction.content[0].text return None, "Failed after max iterations"

Evaluation: Measuring Success

To build a reliable system, you need to evaluate Claude's SQL generation. Here's a simple evaluation framework:

def evaluate_queries(test_cases):
    """
    test_cases: list of dicts with 'query' and 'expected_sql'
    """
    correct = 0
    for case in test_cases:
        generated_sql, _ = text_to_sql_with_self_improvement(case['query'])
        
        # Compare results, not exact SQL strings
        cursor.execute(generated_sql)
        generated_results = set(cursor.fetchall())
        
        cursor.execute(case['expected_sql'])
        expected_results = set(cursor.fetchall())
        
        if generated_results == expected_results:
            correct += 1
    
    return correct / len(test_cases)

Key Takeaways

  • Start simple, iterate fast: Begin with a basic prompt and schema, then layer in few-shot examples, chain-of-thought, and RAG as complexity grows.
  • Include schema and sample data: Claude performs significantly better when it understands both the structure and content of your database.
  • Use RAG for large schemas: Don't overwhelm Claude with irrelevant tables. Retrieve only what's needed for each query.
  • Implement self-improvement loops: Claude can catch and fix its own SQL errors, dramatically improving reliability in production.
  • Evaluate by results, not syntax: Two different SQL queries can produce the same correct output. Compare result sets, not exact SQL strings.
Text-to-SQL with Claude is not just about generating correct SQL—it's about building a system that can understand intent, reason about data structures, and recover gracefully from mistakes. Start with the patterns in this guide, and you'll have a production-ready system in no time.