BeClaude
GuideBeginnerBest Practices2026-05-22

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to Self-Improving Queries

Learn to build a production-ready Text-to-SQL system using Claude. Covers prompting, RAG for complex schemas, chain-of-thought reasoning, and self-improving query loops.

Quick Answer

This guide walks you through building a Text-to-SQL system with Claude, covering basic prompting, few-shot learning, chain-of-thought reasoning, RAG for large schemas, and a self-improvement loop that lets Claude execute, analyze, and fix its own SQL queries.

Text-to-SQLPrompt EngineeringRAGDatabaseClaude API

Building a Robust Text-to-SQL System with Claude: From Basic Prompts to Self-Improving Queries

Introduction

Text-to-SQL is one of the most powerful applications of large language models in the enterprise. It bridges the gap between natural language and structured data, allowing non-technical users to query databases without knowing SQL syntax. Claude excels at this task because of its strong contextual understanding, ability to handle complex reasoning, and capacity for self-correction.

In this guide, you'll learn how to build a production-ready Text-to-SQL system using Claude. We'll start with a basic prompt and progressively add sophistication: few-shot examples, chain-of-thought reasoning, Retrieval Augmented Generation (RAG) for large schemas, and a self-improvement loop where Claude executes, analyzes, and fixes its own SQL.

Why Text-to-SQL Matters

Before diving into the implementation, let's understand the value:

  • Democratize data access: Business analysts, product managers, and executives can query databases without waiting for engineering.
  • Accelerate prototyping: Data scientists can quickly iterate on queries using natural language.
  • Build intuitive interfaces: Integrate natural language querying into chatbots, dashboards, and internal tools.
  • Handle complexity: Claude can generate queries with multiple JOINs, subqueries, aggregations, and window functions that would take humans significant time to write.

Setting Up Your Environment

First, let's create a test SQLite database with two tables: employees and departments. This will be our sandbox throughout the guide.

import sqlite3

Create an in-memory SQLite database

conn = sqlite3.connect(':memory:') cursor = conn.cursor()

Create departments table

cursor.execute(''' CREATE TABLE departments ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, location TEXT ) ''')

Create employees table

cursor.execute(''' CREATE TABLE employees ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, department_id INTEGER, salary REAL, hire_date DATE, FOREIGN KEY (department_id) REFERENCES departments(id) ) ''')

Insert sample data

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?)', [ (1, 'Engineering', 'New York'), (2, 'Sales', 'San Francisco'), (3, 'Marketing', 'Chicago') ])

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?)', [ (1, 'Alice', 1, 120000, '2020-01-15'), (2, 'Bob', 1, 95000, '2021-03-20'), (3, 'Charlie', 2, 80000, '2019-11-01'), (4, 'Diana', 3, 75000, '2022-06-10'), (5, 'Eve', 2, 90000, '2020-09-05') ])

conn.commit()

Step 1: The Basic Text-to-SQL Prompt

A good prompt needs three things: clear instructions, the user's query, and the database schema. Here's a minimal implementation:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def get_schema(): cursor.execute("SELECT sql FROM sqlite_master WHERE type='table'") return "\n".join(row[0] for row in cursor.fetchall())

def basic_prompt(user_query, schema): return f"""You are a SQL expert. Convert the following natural language query into a SQL query.

Database Schema: {schema}

User Query: {user_query}

Generate only the SQL query, no explanation."""

response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=200, messages=[{"role": "user", "content": basic_prompt("Show me all employees in Engineering", get_schema())}] )

print(response.content[0].text)

Output: SELECT * FROM employees WHERE department_id = 1;

This works for simple queries, but it has limitations. Let's improve it.

Step 2: Few-Shot Learning with Examples

Adding examples (few-shot learning) dramatically improves accuracy, especially for complex queries. Here's an enhanced prompt generator:

def few_shot_prompt(user_query, schema):
    examples = """
Examples:

User: List all departments SQL: SELECT * FROM departments;

User: Show employee names and their department names SQL: SELECT e.name, d.name FROM employees e JOIN departments d ON e.department_id = d.id;

User: What is the average salary by department? SQL: SELECT d.name, AVG(e.salary) FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name; """ return f"""You are a SQL expert. Convert natural language to SQL.

{schema}

{examples}

User: {user_query} SQL:"""

Pro tip: Include a few rows of real data alongside the schema. This gives Claude concrete context about data formats, which reduces hallucination.

Step 3: Chain-of-Thought Prompting for Complex Queries

For multi-step queries, chain-of-thought (CoT) prompting helps Claude reason through the problem. Use XML tags to structure the reasoning:

def cot_prompt(user_query, schema):
    return f"""You are a SQL expert. Convert the user's query into SQL.

Schema: {schema}

User Query: {user_query}

First, think step-by-step inside <reasoning> tags:

  • Identify the tables needed
  • Determine JOIN conditions
  • Identify filters, aggregations, and ordering
  • Build the query incrementally
Then output the final SQL inside <sql> tags.

Example: <reasoning>

  • The query asks for average salary by department.
  • We need employees table for salary, departments table for names.
  • JOIN on department_id.
  • GROUP BY department name.
  • SELECT department name and AVG(salary).
</reasoning> <sql> SELECT d.name, AVG(e.salary) FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name; </sql>

Now process the user's query:"""

CoT is particularly effective for queries involving:

  • Multiple JOINs across 3+ tables
  • Subqueries or CTEs
  • Window functions (RANK, ROW_NUMBER, etc.)
  • Complex WHERE conditions with nested logic

Step 4: RAG for Large Database Schemas

When your database has dozens or hundreds of tables, including the entire schema in every prompt is impractical and expensive. RAG solves this by retrieving only the relevant schema information.

Here's a simplified implementation using VoyageAI embeddings:

import voyageai

vo = voyageai.Client(api_key="your-voyage-api-key")

def build_schema_embeddings(): """Create embeddings for each table and column.""" schema_items = [] cursor.execute("SELECT name FROM sqlite_master WHERE type='table'") for (table_name,) in cursor.fetchall(): cursor.execute(f"PRAGMA table_info({table_name})") columns = [row[1] for row in cursor.fetchall()] schema_items.append(f"Table: {table_name}, Columns: {', '.join(columns)}") embeddings = vo.embed(schema_items, model="voyage-2").embeddings return schema_items, embeddings

def retrieve_relevant_schema(query, schema_items, embeddings): query_embedding = vo.embed([query], model="voyage-2").embeddings[0] # Calculate cosine similarity and return top-k items similarities = [cosine_similarity(query_embedding, emb) for emb in embeddings] top_indices = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:3] return "\n".join([schema_items[i] for i in top_indices])

Then integrate it into your prompt:

def rag_prompt(user_query):
    schema_items, embeddings = build_schema_embeddings()
    relevant_schema = retrieve_relevant_schema(user_query, schema_items, embeddings)
    return f"""Relevant Schema:
{relevant_schema}

User Query: {user_query}

Generate SQL:"""

Step 5: Self-Improving Query Loop

The most advanced technique is a self-improvement loop where Claude executes its SQL, inspects the results, and fixes errors. This dramatically increases reliability.

def self_improving_query(user_query, max_attempts=3):
    schema = get_schema()
    
    for attempt in range(max_attempts):
        # Generate SQL
        prompt = f"""Schema:
{schema}

Query: {user_query}

Generate SQL:""" response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=300, messages=[{"role": "user", "content": prompt}] ) sql = response.content[0].text.strip() # Try to execute try: cursor.execute(sql) results = cursor.fetchall() columns = [desc[0] for desc in cursor.description] # Ask Claude to validate the results validation_prompt = f"""Original query: {user_query} Generated SQL: {sql} Results: {results} Columns: {columns}

Are these results correct? If yes, say 'CORRECT'. If no, explain the issue.""" validation = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=200, messages=[{"role": "user", "content": validation_prompt}] ) if "CORRECT" in validation.content[0].text: return sql, results else: # Feed the error back to Claude for improvement user_query = f"{user_query}. Previous attempt had issue: {validation.content[0].text}" except Exception as e: # SQL execution error - feed back to Claude user_query = f"{user_query}. Previous SQL '{sql}' failed with error: {str(e)}. Fix it." return None, None

This loop handles:

  • Syntax errors: Claude sees the error message and corrects the SQL
  • Logical errors: Claude sees unexpected results and adjusts the query
  • Schema mismatches: Claude learns from failed column references

Evaluations: Measuring Success

To ensure your system is reliable, build an evaluation framework:

test_cases = [
    {"query": "List all employees hired in 2020", "expected_sql": "SELECT * FROM employees WHERE strftime('%Y', hire_date) = '2020'"},
    {"query": "Show department with highest average salary", "expected_sql": "SELECT d.name, AVG(e.salary) as avg_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name ORDER BY avg_salary DESC LIMIT 1"},
]

def evaluate_system(test_cases): correct = 0 for case in test_cases: sql, results = self_improving_query(case["query"]) # Compare generated SQL structure (simplified) if sql and "SELECT" in sql.upper(): correct += 1 print(f"Accuracy: {correct}/{len(test_cases)}")

For production, consider:

  • Execution accuracy: Does the SQL run without errors?
  • Result correctness: Do the results match expected output?
  • Latency: How fast is the full pipeline?
  • Cost: Track token usage per query

Key Takeaways

  • Start simple, iterate fast: Begin with a basic prompt and schema, then layer on examples, chain-of-thought, RAG, and self-improvement as needed.
  • Few-shot learning is essential: 3-5 well-chosen examples dramatically improve accuracy for complex queries.
  • Chain-of-thought reasoning handles complexity: For multi-table JOINs, subqueries, and aggregations, CoT prompting reduces errors significantly.
  • RAG makes large schemas manageable: Dynamically retrieve only the relevant schema information instead of overwhelming Claude with every table.
  • Self-improvement loops catch errors: Let Claude execute, analyze, and fix its own SQL. This is the single most impactful technique for production reliability.

Next Steps

  • Add a feedback mechanism where users can flag incorrect results
  • Implement query caching for frequently asked questions
  • Explore fine-tuning Claude on your specific database patterns
  • Add security guardrails to prevent destructive SQL (DROP, DELETE, UPDATE)
With these techniques, you can build a Text-to-SQL system that handles real-world complexity and delivers reliable, accurate results for your organization.