BeClaude
Guide2026-04-26

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Learn how to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improvement loops.

Quick Answer

This guide teaches you to build a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and self-improvement loops for error handling.

Text-to-SQLClaude APIRAGPrompt EngineeringDatabase

Building a Robust Text-to-SQL System with Claude: A Practical Guide

Text-to-SQL is one of the most impactful applications of large language models (LLMs) in enterprise settings. It bridges the gap between non-technical users and complex databases, allowing anyone to ask questions in plain English and get structured data back. Claude excels at this task because of its strong reasoning capabilities, context understanding, and ability to generate syntactically correct SQL.

This guide walks you through building a production-ready Text-to-SQL system using Claude. You'll learn everything from basic prompting to advanced techniques like Retrieval Augmented Generation (RAG) and self-improvement loops.

Why Text-to-SQL Matters

Before diving into the implementation, it's worth understanding why Text-to-SQL is so valuable:

  • Democratizing Data Access: Business analysts, product managers, and executives can query databases without knowing SQL syntax.
  • Accelerating Development: Data scientists can prototype complex queries in seconds rather than minutes or hours.
  • Enabling Conversational Interfaces: Chatbots and virtual assistants can answer data-driven questions naturally.
  • Handling Complexity: LLMs can generate queries with multiple joins, subqueries, window functions, and aggregations that would take humans significant time to write correctly.

Setting Up Your Environment

Let's start by creating a test SQLite database with two tables: employees and departments. This will serve as our sandbox throughout the guide.

import sqlite3

Create an in-memory SQLite database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments (id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Marketing', 'San Francisco'), (3, 'Sales', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice Smith', 1, 95000, '2020-03-15'), (2, 'Bob Johnson', 1, 85000, '2021-06-01'), (3, 'Charlie Brown', 2, 72000, '2019-11-20'), (4, 'Diana Prince', 3, 88000, '2022-01-10'), (5, 'Eve Wilson', 2, 65000, '2023-04-05') ])

conn.commit()

Creating a Basic Text-to-SQL Prompt

The foundation of any Text-to-SQL system is a well-structured prompt. A good prompt should include:

  • Clear instructions about the task
  • The user's natural language query
  • The database schema so Claude understands the available tables and columns
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def get_schema(conn): """Extract schema information from the database.""" cursor = conn.cursor() # Get all tables cursor.execute("SELECT name FROM sqlite_master WHERE type='table'") tables = cursor.fetchall() schema = "" for table in tables: table_name = table[0] cursor.execute(f"PRAGMA table_info({table_name})") columns = cursor.fetchall() schema += f"Table: {table_name}\n" for col in columns: schema += f" - {col[1]} ({col[2]})\n" schema += "\n" return schema

def generate_basic_prompt(user_query, schema): return f"""You are a SQL expert. Convert the following natural language query into a valid SQL query.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, no explanation."""

Test it

schema = get_schema(conn) user_query = "Show me all employees in the Engineering department with their salaries" prompt = generate_basic_prompt(user_query, schema)

response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=200, messages=[{"role": "user", "content": prompt}] )

sql_query = response.content[0].text print(f"Generated SQL: {sql_query}")

Execute and display results

cursor = conn.cursor() cursor.execute(sql_query) results = cursor.fetchall() print(f"Results: {results}")

Improving with Few-Shot Examples

A basic prompt works, but you'll get better results by including examples. This technique, called few-shot learning, helps Claude understand the expected output format and reasoning pattern.

def generate_few_shot_prompt(user_query, schema):
    examples = """
Examples:

User Query: "List all departments" SQL: SELECT * FROM departments;

User Query: "Find employees hired after 2021" SQL: SELECT * FROM employees WHERE hire_date > '2021-01-01';

User Query: "What is the average salary by department?" SQL: SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name; """ return f"""You are a SQL expert. Convert natural language queries into valid SQL.

Database Schema: {schema}

{examples}

User Query: {user_query} SQL:"""

Pro Tip: Consider including a few rows of actual data alongside the schema. This gives Claude context about data formats, NULL values, and typical values in each column, leading to more accurate queries.

Chain-of-Thought Prompting for Complex Queries

For complex queries involving multiple joins, subqueries, or aggregations, chain-of-thought (CoT) prompting helps Claude reason step-by-step before generating the final SQL.

def generate_cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the following natural language query into a valid SQL query.

Database Schema: {schema}

User Query: {user_query}

Let's think through this step-by-step: 1) What tables do I need to query? 2) What columns do I need to select? 3) What conditions (WHERE clauses) are needed? 4) Do I need any aggregations (GROUP BY, HAVING)? 5) What is the final SQL query?

<thinking> [Your step-by-step reasoning] </thinking>

<sql> [Your final SQL query] </sql>"""

The XML tags (<thinking> and <sql>) help parse Claude's output programmatically, separating the reasoning from the final answer.

Implementing RAG for Large Database Schemas

As databases grow to dozens or hundreds of tables, including the entire schema in every prompt becomes impractical and expensive. Retrieval Augmented Generation (RAG) solves this by dynamically retrieving only the relevant schema information.

Here's how to implement RAG for Text-to-SQL:

import voyageai
from typing import List, Dict
import numpy as np

class SchemaVectorDB: def __init__(self, api_key: str): self.client = voyageai.Client(api_key=api_key) self.embeddings = [] self.metadata = [] def add_table(self, table_name: str, columns: List[Dict], sample_data: List[str] = None): """Add a table's schema information to the vector database.""" # Create a rich text description of the table description = f"Table: {table_name}\n" for col in columns: description += f" - {col['name']} ({col['type']}): {col.get('description', '')}\n" if sample_data: description += "\nSample data:\n" for row in sample_data[:3]: # Include up to 3 sample rows description += f" {row}\n" # Generate embedding embedding = self.client.embed([description], model="voyage-2").embeddings[0] self.embeddings.append(embedding) self.metadata.append({ "table_name": table_name, "description": description, "columns": columns }) def query(self, user_query: str, top_k: int = 3) -> List[Dict]: """Retrieve the most relevant schema information for a user query.""" query_embedding = self.client.embed([user_query], model="voyage-2").embeddings[0] # Calculate cosine similarity similarities = [] for emb in self.embeddings: similarity = np.dot(query_embedding, emb) / ( np.linalg.norm(query_embedding) * np.linalg.norm(emb) ) similarities.append(similarity) # Get top-k most relevant tables top_indices = np.argsort(similarities)[-top_k:][::-1] return [self.metadata[i] for i in top_indices]

Usage example

vector_db = SchemaVectorDB(api_key="your-voyage-api-key")

Add tables to the vector database

vector_db.add_table( table_name="employees", columns=[ {"name": "id", "type": "INTEGER", "description": "Primary key"}, {"name": "name", "type": "TEXT", "description": "Employee full name"}, {"name": "department_id", "type": "INTEGER", "description": "Foreign key to departments"}, {"name": "salary", "type": "REAL", "description": "Annual salary in USD"}, {"name": "hire_date", "type": "DATE", "description": "Date employee was hired"} ] )

vector_db.add_table( table_name="departments", columns=[ {"name": "id", "type": "INTEGER", "description": "Primary key"}, {"name": "name", "type": "TEXT", "description": "Department name"}, {"name": "location", "type": "TEXT", "description": "Office location"} ] )

Retrieve relevant schema for a query

user_query = "Show me the highest paid employee in each department" relevant_schema = vector_db.query(user_query, top_k=2)

Build prompt with only relevant schema

schema_context = "\n".join([s["description"] for s in relevant_schema]) prompt = f"""You are a SQL expert. Convert the following query into SQL.

Relevant Database Schema: {schema_context}

User Query: {user_query}

SQL:"""

Implementing Query Self-Improvement

One of Claude's most powerful capabilities is self-correction. You can implement a loop where Claude:

  • Generates a SQL query
  • Executes it against the database
  • Analyzes the results or catches errors
  • Improves the query if needed
def text_to_sql_with_self_improvement(user_query, schema, conn, max_iterations=3):
    """
    Generate SQL, execute it, and self-improve if there are errors or unexpected results.
    """
    prompt = f"""You are a SQL expert. Convert this query to SQL.

Schema: {schema}

Query: {user_query}

SQL:""" for iteration in range(max_iterations): # Generate SQL response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) sql_query = response.content[0].text # Try to execute try: cursor = conn.cursor() cursor.execute(sql_query) results = cursor.fetchall() # Check if results make sense if len(results) == 0: # No results - might be an issue with the query feedback = f"The query returned 0 rows. Here's the query I tried: {sql_query}\n\nCan you fix this?" prompt = f"{prompt}\n\nPrevious attempt: {sql_query}\nFeedback: {feedback}\n\nImproved SQL:" continue return sql_query, results except Exception as e: # SQL error - ask Claude to fix it feedback = f"The query failed with error: {str(e)}\n\nHere's the query I tried: {sql_query}" prompt = f"{prompt}\n\nPrevious attempt: {sql_query}\nError: {feedback}\n\nFixed SQL:" return None, "Failed after maximum iterations"

Test it

sql, results = text_to_sql_with_self_improvement( "Find departments with average salary above 80000", schema, conn ) print(f"Final SQL: {sql}") print(f"Results: {results}")

Evaluating Your Text-to-SQL System

To ensure your system is reliable, you need a robust evaluation framework. Here's a simple approach:

def evaluate_text_to_sql(test_cases, text_to_sql_function, conn):
    """
    Evaluate a Text-to-SQL system against a set of test cases.
    
    test_cases: List of dicts with 'query' and 'expected_sql' or 'expected_results'
    """
    results = []
    for test in test_cases:
        generated_sql, actual_results = text_to_sql_function(test['query'], conn)
        
        # Compare with expected results
        expected_results = test.get('expected_results')
        match = actual_results == expected_results if expected_results else None
        
        results.append({
            'query': test['query'],
            'generated_sql': generated_sql,
            'actual_results': actual_results,
            'expected_results': expected_results,
            'match': match
        })
    
    accuracy = sum(1 for r in results if r['match']) / len(results)
    return results, accuracy

Key Takeaways

  • Start simple, iterate fast: Begin with a basic prompt and schema, then add few-shot examples, chain-of-thought reasoning, and RAG as complexity grows.
  • RAG is essential for production systems: For databases with more than 10-15 tables, dynamic schema retrieval reduces token usage and improves accuracy.
  • Self-improvement loops catch errors: Let Claude execute its own SQL, analyze results, and fix mistakes automatically—this dramatically improves reliability.
  • Include sample data in prompts: A few rows of real data alongside the schema helps Claude understand data formats and avoid common mistakes.
  • Evaluate rigorously: Build a test suite with known queries and expected results to measure accuracy and catch regressions when you modify prompts or schemas.