GuideBeginnerBest Practices2026-05-22

How to Build a Custom Partner Integration with the Claude API

A practical guide to integrating Claude AI into your own platform or service using the Anthropic API, covering authentication, messaging, streaming, and best practices.

Quick Answer

This guide walks you through building a custom partner integration with Claude, from API key setup and authentication to sending messages, handling streaming responses, and following Anthropic's partner best practices.

Claude APIintegrationpartner ecosystemstreamingauthentication

Introduction

As Claude AI continues to reshape how businesses interact with language models, many organizations are looking to build their own custom integrations—becoming what Anthropic calls "Partners." Whether you're embedding Claude into a SaaS product, building an internal assistant, or creating a new customer-facing chatbot, understanding the official API integration patterns is essential.

This guide covers the practical steps to build a robust partner integration using the Claude API. You'll learn how to authenticate, send messages, handle streaming, and follow best practices that Anthropic recommends for partners.

Prerequisites

Before you begin, make sure you have:

An Anthropic Console account
An API key (generated in the console under API Keys)
Basic familiarity with Python or TypeScript
A development environment with curl, Python 3.8+, or Node.js 16+

Step 1: Authentication and API Key Management

Every API call to Claude requires an x-api-key header. Treat your API key like a password—never hardcode it in source code or expose it in client-side applications.

Best Practices for API Keys

Store keys in environment variables or a secrets manager
Rotate keys regularly
Use separate keys for development and production
Implement rate limiting on your side to avoid hitting Anthropic's limits

Example: Setting Up Authentication

Python (using requests):

import os
import requests
API_KEY = os.environ.get("ANTHROPIC_API_KEY")
BASE_URL = "https://api.anthropic.com/v1"
headers = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json"
}

TypeScript (using fetch):

const API_KEY = process.env.ANTHROPIC_API_KEY;
const BASE_URL = "https://api.anthropic.com/v1";
const headers = {
  "x-api-key": API_KEY!,
  "anthropic-version": "2023-06-01",
  "content-type": "application/json"
};

Step 2: Sending Your First Message

The core endpoint for generating text is POST /v1/messages. You send a list of messages (with roles user or assistant) and receive a completion.

Basic Request

Python:

def send_message(user_message: str) -> dict:
    payload = {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": user_message}
        ]
    }
    response = requests.post(
        f"{BASE_URL}/messages",
        headers=headers,
        json=payload
    )
    response.raise_for_status()
    return response.json()
result = send_message("Explain quantum computing in simple terms.")
print(result["content"][0]["text"])

TypeScript:

async function sendMessage(userMessage: string) {
  const payload = {
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [
      { role: "user", content: userMessage }
    ]
  };
const response = await fetch(${BASE_URL}/messages, {
    method: "POST",
    headers,
    body: JSON.stringify(payload)
  });
if (!response.ok) throw new Error(API error: ${response.status});
  return response.json();
}
sendMessage("Explain quantum computing in simple terms.")
  .then(data => console.log(data.content[0].text));

Step 3: Handling Streaming Responses

For a better user experience, especially in chat interfaces, use streaming. Claude supports server-sent events (SSE) that let you display tokens as they're generated.

Streaming in Python

import json
def stream_message(user_message: str):
    payload = {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "stream": True,
        "messages": [
            {"role": "user", "content": user_message}
        ]
    }
    
    with requests.post(
        f"{BASE_URL}/messages",
        headers=headers,
        json=payload,
        stream=True
    ) as response:
        for line in response.iter_lines():
            if line:
                decoded = line.decode("utf-8")
                if decoded.startswith("data: "):
                    data = json.loads(decoded[6:])
                    if data["type"] == "content_block_delta":
                        print(data["delta"]["text"], end="", flush=True)
stream_message("Write a short poem about AI.")

Streaming in TypeScript

async function streamMessage(userMessage: string) {
  const payload = {
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    stream: true,
    messages: [
      { role: "user", content: userMessage }
    ]
  };
const response = await fetch(${BASE_URL}/messages, {
    method: "POST",
    headers,
    body: JSON.stringify(payload)
  });
const reader = response.body!.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() || "";
    
    for (const line of lines) {
      if (line.startsWith("data: ")) {
        const data = JSON.parse(line.slice(6));
        if (data.type === "content_block_delta") {
          process.stdout.write(data.delta.text);
        }
      }
    }
  }
}
streamMessage("Write a short poem about AI.");

Step 4: Adding System Prompts and Context

For partner integrations, you often need to control Claude's behavior. Use the system parameter to set instructions that persist across the conversation.

def send_with_system(system_prompt: str, user_message: str) -> str:
    payload = {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "system": system_prompt,
        "messages": [
            {"role": "user", "content": user_message}
        ]
    }
    response = requests.post(
        f"{BASE_URL}/messages",
        headers=headers,
        json=payload
    )
    return response.json()["content"][0]["text"]
Example: Customer support bot
system = "You are a helpful customer support agent for Acme Corp. " \
         "Be polite, concise, and only answer based on the provided knowledge base."
reply = send_with_system(system, "How do I reset my password?")
print(reply)

Step 5: Error Handling and Retries

Production integrations must handle errors gracefully. Common HTTP status codes include:

400: Bad request (check your payload)
401: Unauthorized (invalid API key)
429: Rate limited (implement exponential backoff)
500: Server error (retry after a delay)

Retry Logic Example

import time
from requests.exceptions import RequestException
def send_with_retry(user_message: str, max_retries: int = 3) -> dict:
    for attempt in range(max_retries):
        try:
            response = requests.post(
                f"{BASE_URL}/messages",
                headers=headers,
                json={
                    "model": "claude-3-5-sonnet-20241022",
                    "max_tokens": 1024,
                    "messages": [{"role": "user", "content": user_message}]
                }
            )
            if response.status_code == 429:
                wait = 2 ** attempt  # exponential backoff
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue
            response.raise_for_status()
            return response.json()
        except RequestException as e:
            if attempt == max_retries - 1:
                raise
            print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
            time.sleep(1)

Step 6: Partner-Specific Best Practices

Anthropic encourages partners to follow these guidelines:

1. Respect Rate Limits

Check your plan's rate limits in the Anthropic Console. Implement client-side throttling to avoid hitting limits.

2. Cache Common Responses

For frequently asked questions, cache Claude's responses to reduce API costs and latency. Use a short TTL (time-to-live) to keep responses fresh.

3. Monitor Token Usage

Track both input and output tokens. Use the usage field in API responses to bill customers or optimize prompts.

4. Implement Content Moderation

Use Claude's safety features or a separate moderation layer to filter harmful outputs before displaying to end users.

5. Provide Clear Attribution

When displaying Claude-generated content, clearly indicate it was generated by AI. Anthropic's brand guidelines recommend phrasing like "Generated with Claude by Anthropic."

Full Integration Example

Here's a minimal but complete Python class that wraps the Claude API for a partner integration:

import os
import requests
import json
from typing import List, Dict, Optional
class ClaudePartnerClient:
    def __init__(self, api_key: Optional[str] = None):
        self.api_key = api_key or os.environ["ANTHROPIC_API_KEY"]
        self.base_url = "https://api.anthropic.com/v1"
        self.headers = {
            "x-api-key": self.api_key,
            "anthropic-version": "2023-06-01",
            "content-type": "application/json"
        }
    
    def chat(self, 
             messages: List[Dict[str, str]], 
             system: Optional[str] = None,
             max_tokens: int = 1024,
             stream: bool = False) -> Dict:
        
        payload = {
            "model": "claude-3-5-sonnet-20241022",
            "max_tokens": max_tokens,
            "messages": messages,
            "stream": stream
        }
        if system:
            payload["system"] = system
        
        response = requests.post(
            f"{self.base_url}/messages",
            headers=self.headers,
            json=payload,
            stream=stream
        )
        response.raise_for_status()
        
        if stream:
            return self._handle_stream(response)
        return response.json()
    
    def _handle_stream(self, response):
        full_text = ""
        for line in response.iter_lines():
            if line:
                decoded = line.decode("utf-8")
                if decoded.startswith("data: "):
                    data = json.loads(decoded[6:])
                    if data["type"] == "content_block_delta":
                        chunk = data["delta"]["text"]
                        full_text += chunk
                        print(chunk, end="", flush=True)
        print()
        return {"content": [{"text": full_text}]}
Usage
client = ClaudePartnerClient()
response = client.chat(
    messages=[{"role": "user", "content": "Hello, Claude!"}],
    system="You are a helpful assistant."
)
print(response["content"][0]["text"])

Conclusion

Building a partner integration with Claude is straightforward when you follow the official API patterns. Start with authentication, master the messages endpoint, add streaming for real-time experiences, and layer in error handling and best practices. As Anthropic continues to evolve the API, keep an eye on the changelog for new features like tool use, vision, and expanded model availability.

Key Takeaways

Authenticate securely using API keys stored in environment variables, never in client-side code.
Use the /v1/messages endpoint for all text generation, and enable streaming for better user experience.
Implement retry logic with exponential backoff to handle rate limits and transient errors gracefully.
Follow Anthropic's partner best practices: cache responses, monitor token usage, and provide clear AI attribution.
Build a reusable client class to encapsulate API logic, making your integration maintainable and testable.