How to Build a Custom Partner Integration with the Claude API
A practical guide to creating partner integrations with the Claude API, covering authentication, message streaming, error handling, and best practices for production deployments.
Learn how to build a production-ready partner integration with the Claude API, including authentication, message streaming, error handling, and rate limiting strategies.
Introduction
Building a partner integration with the Claude API allows you to embed powerful conversational AI capabilities directly into your own application, platform, or service. Whether you're creating a customer support chatbot, a content generation tool, or an AI-powered assistant, a well-designed integration ensures reliability, scalability, and a great user experience.
This guide walks you through the essential steps to build a robust partner integration using the Claude API. We'll cover authentication, message streaming, error handling, rate limiting, and production best practices.
Prerequisites
Before you begin, make sure you have:
- A Claude API key from Anthropic Console
- Python 3.8+ or Node.js 16+ installed
- Basic familiarity with REST APIs and JSON
- An understanding of your application's authentication model (OAuth, API keys, etc.)
Step 1: Set Up Authentication
Every request to the Claude API requires an API key passed via the x-api-key header. For partner integrations, you should never expose your API key client-side. Instead, proxy all requests through your backend.
Python Example
import requests
API_KEY = "sk-ant-..." # Store securely, e.g., environment variable
BASE_URL = "https://api.anthropic.com/v1"
def get_headers():
return {
"x-api-key": API_KEY,
"anthropic-version": "2023-06-01",
"Content-Type": "application/json"
}
TypeScript Example
const API_KEY = process.env.CLAUDE_API_KEY;
const BASE_URL = "https://api.anthropic.com/v1";
function getHeaders(): Record<string, string> {
return {
"x-api-key": API_KEY!,
"anthropic-version": "2023-06-01",
"Content-Type": "application/json"
};
}
Security Tip: Never hardcode your API key. Use environment variables or a secrets manager. Rotate keys regularly.
Step 2: Send Your First Message
The Messages endpoint is the primary way to interact with Claude. A basic request includes a model, max_tokens, and an array of messages.
Python Example
import requests
def send_message(user_input: str):
response = requests.post(
f"{BASE_URL}/messages",
headers=get_headers(),
json={
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": user_input}
]
}
)
response.raise_for_status()
return response.json()
Usage
result = send_message("Explain quantum computing in simple terms.")
print(result["content"][0]["text"])
TypeScript Example
async function sendMessage(userInput: string) {
const response = await fetch(${BASE_URL}/messages, {
method: "POST",
headers: getHeaders(),
body: JSON.stringify({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{ role: "user", content: userInput }
]
})
});
if (!response.ok) {
throw new Error(API error: ${response.status});
}
const data = await response.json();
return data.content[0].text;
}
Step 3: Implement Streaming for Real-Time Responses
For a better user experience, stream responses token by token. This reduces perceived latency and allows you to display partial results as Claude generates them.
Python with Server-Sent Events (SSE)
import json
import requests
def stream_message(user_input: str):
with requests.post(
f"{BASE_URL}/messages",
headers=get_headers(),
json={
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"stream": True,
"messages": [
{"role": "user", "content": user_input}
]
},
stream=True
) as response:
response.raise_for_status()
for line in response.iter_lines():
if line:
line = line.decode("utf-8")
if line.startswith("data: "):
data = json.loads(line[6:])
if data["type"] == "content_block_delta":
yield data["delta"]["text"]
Usage
for token in stream_message("Tell me a short story."):
print(token, end="", flush=True)
TypeScript with Fetch API
async function* streamMessage(userInput: string): AsyncGenerator<string> {
const response = await fetch(${BASE_URL}/messages, {
method: "POST",
headers: getHeaders(),
body: JSON.stringify({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
stream: true,
messages: [
{ role: "user", content: userInput }
]
})
});
if (!response.ok) {
throw new Error(API error: ${response.status});
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = JSON.parse(line.slice(6));
if (data.type === "content_block_delta") {
yield data.delta.text;
}
}
}
}
}
// Usage
for await (const token of streamMessage("Tell me a short story.")) {
process.stdout.write(token);
}
Step 4: Handle Errors Gracefully
Production integrations must handle API errors, network failures, and rate limits. Implement retry logic with exponential backoff.
Python Retry Example
import time
import requests
from requests.exceptions import RequestException
def send_message_with_retry(user_input: str, max_retries: int = 3):
for attempt in range(max_retries):
try:
response = requests.post(
f"{BASE_URL}/messages",
headers=get_headers(),
json={
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": user_input}
]
}
)
if response.status_code == 429:
# Rate limited - wait and retry
wait_time = 2 ** attempt
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
continue
response.raise_for_status()
return response.json()
except RequestException as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt
print(f"Error: {e}. Retrying in {wait_time}s...")
time.sleep(wait_time)
return None
Common Error Codes
| Status Code | Meaning | Action |
|---|---|---|
| 400 | Bad Request | Check your request payload |
| 401 | Unauthorized | Verify your API key |
| 429 | Rate Limited | Implement backoff |
| 500 | Server Error | Retry with backoff |
Step 5: Manage Rate Limits and Quotas
Claude API has rate limits based on your tier. Track your usage via response headers:
x-ratelimit-requests-remainingx-ratelimit-tokens-remainingx-ratelimit-requests-reset
Python Rate Limit Monitor
def send_message_with_monitoring(user_input: str):
response = requests.post(
f"{BASE_URL}/messages",
headers=get_headers(),
json={
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": user_input}
]
}
)
# Log rate limit info
remaining = response.headers.get("x-ratelimit-requests-remaining")
tokens_remaining = response.headers.get("x-ratelimit-tokens-remaining")
print(f"Requests remaining: {remaining}, Tokens remaining: {tokens_remaining}")
response.raise_for_status()
return response.json()
Step 6: Production Best Practices
1. Use a Queue for High-Volume Requests
For partner integrations handling many users, implement a message queue (e.g., Redis, RabbitMQ) to manage request flow and avoid overwhelming the API.
2. Cache Common Responses
Cache responses for frequently asked questions or deterministic prompts to reduce API costs and latency.
3. Implement User-Level Rate Limiting
Protect your API key by rate-limiting per user in your application. This prevents one abusive user from exhausting your quota.
4. Log and Monitor
Log all API requests and responses (excluding sensitive content) for debugging and monitoring. Use tools like Datadog, Grafana, or CloudWatch.
5. Handle Long-Running Requests
Set appropriate timeouts. For streaming, consider WebSocket connections for persistent sessions.
Step 7: Testing Your Integration
Before going live, test your integration thoroughly:
- Unit tests: Mock API responses for each endpoint
- Integration tests: Use a test API key against the real API
- Load tests: Simulate multiple concurrent users
- Error tests: Verify behavior under network failures and rate limits
Python Test Example
import pytest
from unittest.mock import patch
def test_send_message_success():
with patch("requests.post") as mock_post:
mock_post.return_value.status_code = 200
mock_post.return_value.json.return_value = {
"content": [{"text": "Hello!"}]
}
result = send_message("Hi")
assert result["content"][0]["text"] == "Hello!"
def test_send_message_rate_limited():
with patch("requests.post") as mock_post:
mock_post.return_value.status_code = 429
with pytest.raises(Exception):
send_message_with_retry("Hi", max_retries=1)
Conclusion
Building a partner integration with the Claude API is straightforward when you follow these best practices. Start with simple synchronous requests, add streaming for better UX, implement robust error handling, and monitor your usage closely.
Remember that the Claude API is constantly evolving. Check the official changelog regularly for updates to models, endpoints, and features.
Key Takeaways
- Always proxy API requests through your backend to keep your API key secure and enforce user-level rate limiting.
- Implement streaming for real-time token-by-token responses to improve perceived performance.
- Use exponential backoff retry logic to handle rate limits (429) and transient server errors gracefully.
- Monitor rate limit headers (
x-ratelimit-*) to stay within your quota and avoid unexpected throttling. - Test thoroughly with unit, integration, and load tests before deploying to production.