GuideBeginnerBest Practices2026-05-12

How to Build a Custom Partner Integration with the Claude API

A practical guide to creating partner integrations with the Claude API, covering authentication, message streaming, error handling, and best practices for production deployments.

Quick Answer

Learn how to build a production-ready partner integration with the Claude API, including authentication, message streaming, error handling, and rate limiting strategies.

Claude APIpartner integrationAPI best practicesstreamingerror handling

Introduction

Building a partner integration with the Claude API allows you to embed powerful conversational AI capabilities directly into your own application, platform, or service. Whether you're creating a customer support chatbot, a content generation tool, or an AI-powered assistant, a well-designed integration ensures reliability, scalability, and a great user experience.

This guide walks you through the essential steps to build a robust partner integration using the Claude API. We'll cover authentication, message streaming, error handling, rate limiting, and production best practices.

Prerequisites

Before you begin, make sure you have:

A Claude API key from Anthropic Console
Python 3.8+ or Node.js 16+ installed
Basic familiarity with REST APIs and JSON
An understanding of your application's authentication model (OAuth, API keys, etc.)

Step 1: Set Up Authentication

Every request to the Claude API requires an API key passed via the x-api-key header. For partner integrations, you should never expose your API key client-side. Instead, proxy all requests through your backend.

Python Example

import requests
API_KEY = "sk-ant-..."  # Store securely, e.g., environment variable
BASE_URL = "https://api.anthropic.com/v1"
def get_headers():
    return {
        "x-api-key": API_KEY,
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json"
    }

TypeScript Example

const API_KEY = process.env.CLAUDE_API_KEY;
const BASE_URL = "https://api.anthropic.com/v1";
function getHeaders(): Record<string, string> {
  return {
    "x-api-key": API_KEY!,
    "anthropic-version": "2023-06-01",
    "Content-Type": "application/json"
  };
}

Security Tip: Never hardcode your API key. Use environment variables or a secrets manager. Rotate keys regularly.

Step 2: Send Your First Message

The Messages endpoint is the primary way to interact with Claude. A basic request includes a model, max_tokens, and an array of messages.

Python Example

import requests
def send_message(user_input: str):
    response = requests.post(
        f"{BASE_URL}/messages",
        headers=get_headers(),
        json={
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [
                {"role": "user", "content": user_input}
            ]
        }
    )
    response.raise_for_status()
    return response.json()
Usage
result = send_message("Explain quantum computing in simple terms.")
print(result["content"][0]["text"])

TypeScript Example

async function sendMessage(userInput: string) {
  const response = await fetch(${BASE_URL}/messages, {
    method: "POST",
    headers: getHeaders(),
    body: JSON.stringify({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      messages: [
        { role: "user", content: userInput }
      ]
    })
  });
if (!response.ok) {
    throw new Error(API error: ${response.status});
  }
const data = await response.json();
  return data.content[0].text;
}

Step 3: Implement Streaming for Real-Time Responses

For a better user experience, stream responses token by token. This reduces perceived latency and allows you to display partial results as Claude generates them.

Python with Server-Sent Events (SSE)

import json
import requests
def stream_message(user_input: str):
    with requests.post(
        f"{BASE_URL}/messages",
        headers=get_headers(),
        json={
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "stream": True,
            "messages": [
                {"role": "user", "content": user_input}
            ]
        },
        stream=True
    ) as response:
        response.raise_for_status()
        for line in response.iter_lines():
            if line:
                line = line.decode("utf-8")
                if line.startswith("data: "):
                    data = json.loads(line[6:])
                    if data["type"] == "content_block_delta":
                        yield data["delta"]["text"]
Usage
for token in stream_message("Tell me a short story."):
    print(token, end="", flush=True)

TypeScript with Fetch API

async function* streamMessage(userInput: string): AsyncGenerator<string> {
  const response = await fetch(${BASE_URL}/messages, {
    method: "POST",
    headers: getHeaders(),
    body: JSON.stringify({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      stream: true,
      messages: [
        { role: "user", content: userInput }
      ]
    })
  });
if (!response.ok) {
    throw new Error(API error: ${response.status});
  }
const reader = response.body!.getReader();
  const decoder = new TextDecoder();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
const chunk = decoder.decode(value);
    const lines = chunk.split("\n");
for (const line of lines) {
      if (line.startsWith("data: ")) {
        const data = JSON.parse(line.slice(6));
        if (data.type === "content_block_delta") {
          yield data.delta.text;
        }
      }
    }
  }
}
// Usage
for await (const token of streamMessage("Tell me a short story.")) {
  process.stdout.write(token);
}

Step 4: Handle Errors Gracefully

Production integrations must handle API errors, network failures, and rate limits. Implement retry logic with exponential backoff.

Python Retry Example

import time
import requests
from requests.exceptions import RequestException
def send_message_with_retry(user_input: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                f"{BASE_URL}/messages",
                headers=get_headers(),
                json={
                    "model": "claude-sonnet-4-20250514",
                    "max_tokens": 1024,
                    "messages": [
                        {"role": "user", "content": user_input}
                    ]
                }
            )
if response.status_code == 429:
                # Rate limited - wait and retry
                wait_time = 2 ** attempt
                print(f"Rate limited. Retrying in {wait_time}s...")
                time.sleep(wait_time)
                continue
response.raise_for_status()
            return response.json()
except RequestException as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            print(f"Error: {e}. Retrying in {wait_time}s...")
            time.sleep(wait_time)
return None

Common Error Codes

Status Code	Meaning	Action
400	Bad Request	Check your request payload
401	Unauthorized	Verify your API key
429	Rate Limited	Implement backoff
500	Server Error	Retry with backoff

Step 5: Manage Rate Limits and Quotas

Claude API has rate limits based on your tier. Track your usage via response headers:

x-ratelimit-requests-remaining
x-ratelimit-tokens-remaining
x-ratelimit-requests-reset

Python Rate Limit Monitor

def send_message_with_monitoring(user_input: str):
    response = requests.post(
        f"{BASE_URL}/messages",
        headers=get_headers(),
        json={
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [
                {"role": "user", "content": user_input}
            ]
        }
    )
# Log rate limit info
    remaining = response.headers.get("x-ratelimit-requests-remaining")
    tokens_remaining = response.headers.get("x-ratelimit-tokens-remaining")
    print(f"Requests remaining: {remaining}, Tokens remaining: {tokens_remaining}")
response.raise_for_status()
    return response.json()

Step 6: Production Best Practices

1. Use a Queue for High-Volume Requests

For partner integrations handling many users, implement a message queue (e.g., Redis, RabbitMQ) to manage request flow and avoid overwhelming the API.

2. Cache Common Responses

Cache responses for frequently asked questions or deterministic prompts to reduce API costs and latency.

3. Implement User-Level Rate Limiting

Protect your API key by rate-limiting per user in your application. This prevents one abusive user from exhausting your quota.

4. Log and Monitor

Log all API requests and responses (excluding sensitive content) for debugging and monitoring. Use tools like Datadog, Grafana, or CloudWatch.

5. Handle Long-Running Requests

Set appropriate timeouts. For streaming, consider WebSocket connections for persistent sessions.

Step 7: Testing Your Integration

Before going live, test your integration thoroughly:

Unit tests: Mock API responses for each endpoint
Integration tests: Use a test API key against the real API
Load tests: Simulate multiple concurrent users
Error tests: Verify behavior under network failures and rate limits

Python Test Example

import pytest
from unittest.mock import patch
def test_send_message_success():
    with patch("requests.post") as mock_post:
        mock_post.return_value.status_code = 200
        mock_post.return_value.json.return_value = {
            "content": [{"text": "Hello!"}]
        }
        result = send_message("Hi")
        assert result["content"][0]["text"] == "Hello!"
def test_send_message_rate_limited():
    with patch("requests.post") as mock_post:
        mock_post.return_value.status_code = 429
        with pytest.raises(Exception):
            send_message_with_retry("Hi", max_retries=1)

Conclusion

Building a partner integration with the Claude API is straightforward when you follow these best practices. Start with simple synchronous requests, add streaming for better UX, implement robust error handling, and monitor your usage closely.

Remember that the Claude API is constantly evolving. Check the official changelog regularly for updates to models, endpoints, and features.

Key Takeaways

Always proxy API requests through your backend to keep your API key secure and enforce user-level rate limiting.
Implement streaming for real-time token-by-token responses to improve perceived performance.
Use exponential backoff retry logic to handle rate limits (429) and transient server errors gracefully.
Monitor rate limit headers (x-ratelimit-*) to stay within your quota and avoid unexpected throttling.
Test thoroughly with unit, integration, and load tests before deploying to production.