BeClaude
Guide2026-04-28

Your Complete Guide to Building with the Claude API: From First Call to Production

Learn how to integrate Claude into your applications using the Messages API, SDKs, and managed agents. Covers setup, tool use, streaming, and best practices for production.

Quick Answer

This guide walks you through the Claude API ecosystem: getting an API key, making your first call with Python/TypeScript, using Messages API features like tool use and streaming, and choosing between direct API access and managed agents for production.

Claude APIMessages APISDKTool UseStreaming

Introduction

Claude isn't just a chat interface—it's a powerful API that you can integrate into your own applications. Whether you're building a customer support bot, a code assistant, or an autonomous agent, the Claude API gives you direct access to the same models that power claude.ai. This guide covers everything you need to go from your first API call to a production-ready integration.

Getting Started: Your First API Call

Before you can build anything, you need an API key and a basic understanding of the Messages API. The Messages API is the primary way to interact with Claude programmatically. You send a list of messages (with roles like user and assistant), and Claude returns a response.

Step 1: Get an API Key

  • Go to the Claude Console.
  • Log in or create an account.
  • Navigate to API Keys and generate a new key.
  • Copy the key and store it securely (you'll need it for authentication).

Step 2: Install an SDK

Anthropic provides official SDKs for Python, TypeScript, Go, Java, Ruby, PHP, C#, and more. For most developers, Python or TypeScript is the best starting point.

Python:
pip install anthropic
TypeScript:
npm install @anthropic-ai/sdk

Step 3: Make Your First Request

Here's a minimal example in Python that sends a simple message and prints the response:

import anthropic

client = anthropic.Anthropic( api_key="your-api-key-here" # Replace with your actual key )

message = client.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[ {"role": "user", "content": "Hello, Claude!"} ] )

print(message.content[0].text)

And the same request in TypeScript:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({ apiKey: 'your-api-key-here', });

async function main() { const message = await client.messages.create({ model: 'claude-opus-4-7', max_tokens: 1024, messages: [ { role: 'user', content: 'Hello, Claude!' } ], }); console.log(message.content[0].text); }

main();

That's it! You've made your first API call. Now let's explore what else you can do.

Understanding the Messages API

The Messages API is the core of Claude's programmatic interface. Here are the key concepts:

Messages Structure

Each request contains an array of messages objects. Each message has:

  • role: Either "user" (you) or "assistant" (Claude).
  • content: The text of the message. Can also be an array of content blocks for multimodal inputs (images, documents).

Key Parameters

  • model: The Claude model you want to use. Options include claude-opus-4-7 (most capable), claude-sonnet-4-6 (best balance), and claude-haiku-4-5 (fastest).
  • max_tokens: The maximum number of tokens Claude can generate in the response.
  • system: An optional system prompt to set Claude's behavior.
  • temperature: Controls randomness (0.0 to 1.0). Lower values make output more deterministic.

Handling Stop Reasons

When Claude finishes generating, the response includes a stop_reason field. Common values:

  • "end_turn": Claude naturally finished its response.
  • "max_tokens": The response was cut off because it hit the max_tokens limit.
  • "tool_use": Claude wants to call a tool (more on this below).
You should always check stop_reason to handle the response appropriately.

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    messages=[{"role": "user", "content": "Tell me a long story"}]
)

if message.stop_reason == "max_tokens": print("Response was truncated. Consider increasing max_tokens.")

Advanced Features

Tool Use (Function Calling)

One of Claude's most powerful features is the ability to use tools. You define tools (functions) that Claude can call, and Claude decides when to invoke them. This is essential for building agents that interact with external systems.

Define a tool:
import json

def get_weather(location: str) -> str: """Get the current weather for a location.""" # In a real app, you'd call a weather API return f"The weather in {location} is sunny and 72°F."

tools = [ { "name": "get_weather", "description": "Get the current weather for a given location", "input_schema": { "type": "object", "properties": { "location": { "type": "string", "description": "City and state, e.g., San Francisco, CA" } }, "required": ["location"] } } ]

Send a request with tools:
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

Check if Claude wants to use a tool

if response.stop_reason == "tool_use": for content in response.content: if content.type == "tool_use": tool_name = content.name tool_input = content.input # Execute the tool if tool_name == "get_weather": result = get_weather(**tool_input) # Send the result back to Claude follow_up = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, {"role": "user", "content": [ { "type": "tool_result", "tool_use_id": content.id, "content": result } ]} ] ) print(follow_up.content[0].text)

Streaming Responses

For a better user experience, you can stream Claude's responses token by token. This is especially useful for chat interfaces where you want to show text as it's generated.

Python streaming example:
stream = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short poem about AI"}],
    stream=True
)

for event in stream: if event.type == "content_block_delta": print(event.delta.text, end="", flush=True)

Extended Thinking

For complex reasoning tasks, you can enable Claude's extended thinking mode. This allows Claude to "think" before responding, producing better results on math, logic, and analysis tasks.

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    thinking={
        "type": "enabled",
        "budget_tokens": 1024  # How many tokens Claude can use for thinking
    },
    messages=[{"role": "user", "content": "Solve this: 27 * 43 + 15 / 3"}]
)

The response includes both thinking and final answer

for content in response.content: if content.type == "thinking": print("Thinking:", content.thinking) elif content.type == "text": print("Answer:", content.text)

Vision and PDF Support

Claude can process images and PDFs. You send them as base64-encoded data in the content array.

import base64

with open("invoice.pdf", "rb") as f: pdf_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": "Summarize this invoice." } ] } ] )

print(response.content[0].text)

Choosing Your Development Path

Anthropic offers two primary ways to build with Claude:

1. Direct API Access (Messages)

Best for: Custom applications, fine-grained control, unique workflows.

You manage everything: conversation state, tool loops, error handling. This gives you maximum flexibility but requires more code.

2. Claude Managed Agents

Best for: Rapid prototyping, autonomous agents, stateful sessions.

Claude handles the agent loop, session management, and event history. You define the agent's tools and instructions, and Claude takes care of the rest.

Managed Agent quickstart (Python):
import anthropic

client = anthropic.Anthropic()

Create an agent

agent = client.agents.create( name="customer-support", model="claude-sonnet-4-6", instructions="You are a helpful customer support agent. Be polite and concise.", tools=[...] # Your custom tools )

Start a session

session = client.agents.sessions.create( agent_id=agent.id )

Send a message

response = client.agents.sessions.message( session_id=session.id, content="I need help with my order" )

print(response.content[0].text)

Production Best Practices

1. Handle Rate Limits

Claude API has rate limits. Implement exponential backoff in your error handling:

import time
from anthropic import RateLimitError

def make_request_with_retry(client, **kwargs): max_retries = 5 for attempt in range(max_retries): try: return client.messages.create(**kwargs) except RateLimitError: if attempt == max_retries - 1: raise wait_time = 2 ** attempt # Exponential backoff time.sleep(wait_time)

2. Use Prompt Caching

For repeated system prompts or large context, enable prompt caching to reduce costs and latency:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a legal assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[...]
)

3. Evaluate and Test

Use the Evaluation Tool in the Claude Console to test your prompts and measure performance before deploying.

4. Monitor Costs

Track token usage and set budget alerts in the Claude Console. Use claude-haiku-4-5 for simple tasks and claude-opus-4-7 only when you need deep reasoning.

Key Takeaways

  • Start with the Messages API: It's the foundation for all Claude integrations. Get comfortable with sending messages, handling responses, and checking stop reasons.
  • Leverage tool use for real-world applications: Tools allow Claude to interact with external systems, making your agents truly useful.
  • Use streaming for better UX: Streaming responses token-by-token creates a more natural, responsive experience for users.
  • Choose the right model for the job: Use Haiku for speed, Sonnet for balance, and Opus for complex reasoning. Don't overpay for capability you don't need.
  • Build with production in mind: Implement retry logic, prompt caching, and cost monitoring from day one to avoid surprises later.