How to Build a Custom Partner Integration with the Claude API
A practical guide to building partner integrations with the Claude API, covering authentication, message streaming, error handling, and best practices for production deployments.
Learn how to build a production-ready partner integration with the Claude API, including API key management, message streaming, error handling, and rate-limit best practices.
How to Build a Custom Partner Integration with the Claude API
Building a partner integration with the Claude API allows you to embed Claude’s powerful language capabilities directly into your own platform, product, or service. Whether you’re creating a customer support assistant, a content generation tool, or an AI-powered analytics dashboard, this guide walks you through the essential steps to build a robust, production-ready integration.
By the end of this guide, you’ll understand how to authenticate, send messages, handle streaming responses, manage errors, and follow best practices for scaling your integration.
Prerequisites
Before you start, make sure you have:
- A Claude API key from Anthropic Console
- Python 3.8+ or Node.js 16+ installed
- Basic familiarity with REST APIs and JSON
Step 1: Authentication and Setup
Every API request to Claude requires an x-api-key header. Store your API key securely as an environment variable—never hardcode it in your source code.
Python Setup
import os
import requests
API_KEY = os.environ.get("ANTHROPIC_API_KEY")
API_URL = "https://api.anthropic.com/v1/messages"
headers = {
"x-api-key": API_KEY,
"anthropic-version": "2023-06-01",
"content-type": "application/json"
}
TypeScript/Node.js Setup
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
Security Tip: Use a secrets manager (like AWS Secrets Manager or HashiCorp Vault) in production environments. Never expose your API key in client-side code.
Step 2: Sending Your First Message
The Messages API is the primary way to interact with Claude. You send a list of messages (with roles like user and assistant) and receive a generated response.
Python Example
def send_message(user_input: str) -> dict:
payload = {
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": user_input}
]
}
response = requests.post(API_URL, headers=headers, json=payload)
response.raise_for_status()
return response.json()
Usage
result = send_message("Explain quantum computing in simple terms.")
print(result["content"][0]["text"])
TypeScript Example
async function sendMessage(userInput: string) {
const message = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: userInput }],
});
return message.content[0].text;
}
sendMessage("Explain quantum computing in simple terms.").then(console.log);
Step 3: Streaming Responses for Better UX
For a partner integration, streaming is critical. It reduces perceived latency and allows you to display tokens as they are generated, creating a more interactive experience.
Python Streaming
def stream_message(user_input: str):
payload = {
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"stream": True,
"messages": [
{"role": "user", "content": user_input}
]
}
with requests.post(API_URL, headers=headers, json=payload, stream=True) as response:
for line in response.iter_lines():
if line:
# Parse the SSE event
decoded = line.decode('utf-8')
if decoded.startswith('data: '):
data = json.loads(decoded[6:])
if data['type'] == 'content_block_delta':
yield data['delta']['text']
Usage
for token in stream_message("Tell me a story"):
print(token, end='', flush=True)
TypeScript Streaming
async function streamMessage(userInput: string) {
const stream = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
stream: true,
messages: [{ role: "user", content: userInput }],
});
for await (const event of stream) {
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
}
streamMessage("Tell me a story");
Step 4: Handling Errors Gracefully
Production integrations must handle API errors robustly. Claude returns standard HTTP status codes and error objects.
| Status Code | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Invalid payload or missing required field |
| 401 | Unauthorized | Invalid or missing API key |
| 429 | Rate Limited | Too many requests in a short period |
| 500 | Server Error | Temporary Anthropic service issue |
Python Error Handler
import time
def send_message_with_retry(user_input: str, max_retries: int = 3) -> dict:
for attempt in range(max_retries):
try:
payload = {
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"messages": [{"role": "user", "content": user_input}]
}
response = requests.post(API_URL, headers=headers, json=payload)
if response.status_code == 429:
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
continue
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1:
raise Exception(f"Failed after {max_retries} retries: {e}")
time.sleep(1)
Step 5: Best Practices for Partner Integrations
1. Manage Context Windows Wisely
Claude has a limited context window (e.g., 200K tokens for Claude 3.5 Sonnet). For multi-turn conversations, implement a sliding window or summarization strategy to stay within limits.
def trim_conversation_history(history: list, max_tokens: int = 100000) -> list:
"""Keep only the most recent messages that fit within max_tokens."""
# Simplified example: keep last 10 messages
return history[-10:]
2. Implement Rate Limiting
Respect Anthropic’s rate limits by implementing client-side throttling. Use a token bucket or semaphore pattern.
import asyncio
class RateLimiter:
def __init__(self, requests_per_minute: int = 60):
self.interval = 60.0 / requests_per_minute
self.last_request = 0
async def wait(self):
now = time.time()
wait_time = self.interval - (now - self.last_request)
if wait_time > 0:
await asyncio.sleep(wait_time)
self.last_request = time.time()
3. Log and Monitor
Always log API requests and responses (excluding sensitive content) for debugging and monitoring. Use structured logging.
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def log_request(model: str, token_count: int, latency_ms: float):
logger.info(f"Claude API call - Model: {model}, Tokens: {token_count}, Latency: {latency_ms}ms")
4. Handle Partial Outputs
When streaming, users may disconnect. Implement cancellation and partial result saving to avoid wasted tokens.
Step 6: Testing Your Integration
Before going live, test your integration thoroughly:
- Unit tests: Mock the API responses for deterministic testing
- Integration tests: Use a dedicated test API key with low rate limits
- Load tests: Simulate concurrent users to ensure your backend scales
# Example pytest test
import pytest
from unittest.mock import patch
def test_send_message():
mock_response = {
"content": [{"text": "Hello!"}],
"model": "claude-3-5-sonnet-20241022"
}
with patch('requests.post') as mock_post:
mock_post.return_value.json.return_value = mock_response
result = send_message("Hi")
assert result["content"][0]["text"] == "Hello!"
Conclusion
Building a partner integration with the Claude API is straightforward when you follow these patterns. Start with simple message sending, add streaming for responsiveness, implement robust error handling, and always respect rate limits and context windows.
For more advanced use cases—like function calling, multi-modal inputs, or building agents—refer to the official Anthropic documentation.
Key Takeaways
- Secure your API key using environment variables or a secrets manager—never expose it client-side.
- Use streaming to improve user experience by displaying tokens as they are generated.
- Implement exponential backoff for rate-limited (429) responses to avoid overwhelming the API.
- Manage context windows by trimming or summarizing conversation history to stay within token limits.
- Log and monitor all API interactions for debugging, cost tracking, and performance optimization.