GuideBeginnerBest Practices2026-05-20

How to Build a Custom Partner Integration with the Claude API

A practical guide to creating custom partner integrations with Claude API, covering authentication, message streaming, error handling, and best practices for production deployments.

Quick Answer

Learn how to build a production-ready partner integration with Claude API, including API key setup, message streaming, error handling, and rate-limit management using Python and TypeScript examples.

Claude APIpartner integrationauthenticationstreamingerror handling

How to Build a Custom Partner Integration with the Claude API

Building a partner integration with Claude API allows you to embed powerful AI capabilities into your own platform, product, or service. Whether you're creating a customer support chatbot, a content generation tool, or an AI-assisted workflow, this guide walks you through the essential steps to build a robust, production-ready integration.

Understanding the Claude API Partner Model

Anthropic's partner ecosystem enables third-party developers to integrate Claude into their applications. As a partner, you get direct API access to Claude models (including Claude 3.5 Sonnet and Claude 3 Opus) with dedicated support and documentation. The integration process involves:

Obtaining API credentials
Setting up authentication
Making API calls with proper request formatting
Handling responses and errors gracefully
Managing rate limits and scaling

Prerequisites

Before you begin, ensure you have:

A registered account on Anthropic Console
An API key (created in the Console under API Keys)
Basic familiarity with REST APIs and JSON
Python 3.8+ or Node.js 16+ installed locally

Step 1: Obtaining and Securing Your API Key

Your API key is the gateway to Claude. Treat it like a password—never expose it in client-side code or commit it to version control.

Best Practices for API Key Management

Environment variables: Store your key in .env files or your deployment platform's secrets manager.
Restricted keys: Create separate keys for development, staging, and production.
Rotation: Rotate keys periodically and immediately if compromised.

# .env file (never commit this)
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

Step 2: Making Your First API Call

Claude's Messages API is the primary endpoint for sending prompts and receiving responses. Here's a minimal Python example:

import os
import requests
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("ANTHROPIC_API_KEY")
API_URL = "https://api.anthropic.com/v1/messages"
headers = {
    "x-api-key": API_KEY,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json"
}
data = {
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, Claude!"}
    ]
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json()["content"][0]["text"])

TypeScript Equivalent

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});
async function main() {
  const response = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'Hello, Claude!' }],
  });
  console.log(response.content[0].text);
}
main();

Step 3: Implementing Streaming for Better UX

For partner integrations, streaming responses dramatically improve user experience by showing tokens as they're generated. Here's how to implement streaming:

import os
import json
import requests
def stream_claude_response(prompt):
    API_KEY = os.getenv("ANTHROPIC_API_KEY")
    
    headers = {
        "x-api-key": API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json",
        "accept": "text/event-stream"
    }
    
    data = {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 2048,
        "stream": True,
        "messages": [{"role": "user", "content": prompt}]
    }
    
    with requests.post(
        "https://api.anthropic.com/v1/messages",
        headers=headers,
        json=data,
        stream=True
    ) as response:
        for line in response.iter_lines():
            if line:
                decoded = line.decode('utf-8')
                if decoded.startswith('data: '):
                    event_data = json.loads(decoded[6:])
                    if event_data['type'] == 'content_block_delta':
                        yield event_data['delta']['text']
Usage
for token in stream_claude_response("Write a short poem about AI"):
    print(token, end='', flush=True)

Step 4: Handling Errors and Rate Limits

Production integrations must handle API errors gracefully. Claude API returns standard HTTP status codes:

Status Code	Meaning	Handling Strategy
200	Success	Parse response
400	Bad Request	Validate input
401	Unauthorized	Check API key
429	Rate Limited	Implement backoff
500	Server Error	Retry with delay

Implementing Exponential Backoff

import time
import random
def call_claude_with_retry(data, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(API_URL, headers=headers, json=data)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if response.status_code == 429:
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Retrying in {wait_time:.2f}s")
                time.sleep(wait_time)
            elif response.status_code >= 500:
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Server error. Retrying in {wait_time:.2f}s")
                time.sleep(wait_time)
            else:
                raise e
    raise Exception("Max retries exceeded")

Step 5: Building a Complete Integration Pattern

Here's a production-ready integration class that combines all best practices:

import os
import json
import time
import random
import requests
from typing import Generator, Optional
class ClaudePartnerIntegration:
    def __init__(self, api_key: Optional[str] = None):
        self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
        self.base_url = "https://api.anthropic.com/v1/messages"
        self.headers = {
            "x-api-key": self.api_key,
            "anthropic-version": "2023-06-01",
            "content-type": "application/json"
        }
    
    def send_message(
        self,
        prompt: str,
        model: str = "claude-3-5-sonnet-20241022",
        max_tokens: int = 1024,
        stream: bool = False
    ) -> Generator[str, None, None]:
        data = {
            "model": model,
            "max_tokens": max_tokens,
            "stream": stream,
            "messages": [{"role": "user", "content": prompt}]
        }
        
        if stream:
            yield from self._stream_response(data)
        else:
            yield self._get_response(data)
    
    def _get_response(self, data: dict) -> str:
        response = self._make_request(data)
        return response["content"][0]["text"]
    
    def _stream_response(self, data: dict) -> Generator[str, None, None]:
        headers = {**self.headers, "accept": "text/event-stream"}
        with requests.post(self.base_url, headers=headers, json=data, stream=True) as r:
            for line in r.iter_lines():
                if line:
                    decoded = line.decode('utf-8')
                    if decoded.startswith('data: '):
                        event = json.loads(decoded[6:])
                        if event['type'] == 'content_block_delta':
                            yield event['delta']['text']
    
    def _make_request(self, data: dict, max_retries: int = 3) -> dict:
        for attempt in range(max_retries):
            try:
                response = requests.post(self.base_url, headers=self.headers, json=data)
                response.raise_for_status()
                return response.json()
            except requests.exceptions.HTTPError as e:
                if response.status_code == 429:
                    wait = (2 ** attempt) + random.uniform(0, 1)
                    time.sleep(wait)
                elif response.status_code >= 500:
                    wait = (2 ** attempt) + random.uniform(0, 1)
                    time.sleep(wait)
                else:
                    raise e
        raise Exception("Max retries exceeded")
Usage
integration = ClaudePartnerIntegration()
for token in integration.send_message("Explain quantum computing simply", stream=True):
    print(token, end='', flush=True)

Step 6: Testing Your Integration

Always test your integration thoroughly before going live:

Unit tests: Mock API responses to test your logic
Integration tests: Use a test API key with limited quota
Load tests: Simulate concurrent users to verify rate limit handling
Edge cases: Test empty responses, long prompts, and special characters

# Example test using pytest
import pytest
from unittest.mock import patch
def test_send_message_success():
    integration = ClaudePartnerIntegration(api_key="test-key")
    with patch('requests.post') as mock_post:
        mock_post.return_value.status_code = 200
        mock_post.return_value.json.return_value = {
            "content": [{"text": "Hello!"}]
        }
        result = list(integration.send_message("Hi"))
        assert result == ["Hello!"]
def test_rate_limit_retry():
    integration = ClaudePartnerIntegration(api_key="test-key")
    with patch('requests.post') as mock_post:
        mock_post.return_value.status_code = 429
        with pytest.raises(Exception, match="Max retries exceeded"):
            list(integration.send_message("Hi"))

Best Practices for Partner Integrations

Cache responses for identical prompts to reduce API costs and latency.
Monitor usage with Anthropic's Console dashboards to track token consumption.
Implement user authentication if your integration serves multiple end users.
Use system prompts to set Claude's behavior and tone for your specific use case.
Log errors with context (prompt, model, timestamp) for debugging.

Key Takeaways

Secure your API key using environment variables and never expose it client-side.
Implement streaming for real-time token delivery and better user experience.
Handle rate limits with exponential backoff to ensure reliable service.
Build a reusable integration class that encapsulates authentication, retry logic, and streaming.
Test thoroughly with unit, integration, and load tests before production deployment.