Hemicule provides a unified API interface to 20+ language models including OpenAI GPT-4, Anthropic Claude, Google Gemini, and open-source models. Our gateway adds intelligent routing, semantic caching, and full observability without changing your existing code.
https://api.hemicule.com/v1 and use your Hemicule API key.
Get your first API call working in under 5 minutes.
Sign up and copy your API key from the dashboard.
pip install openai
from openai import OpenAI
client = OpenAI(
base_url="https://api.hemicule.com/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="auto", # 'auto' enables intelligent routing
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is an API gateway?"}
]
)
print(response.choices[0].message.content)
All API requests require authentication using an API key. Include your API key in the Authorization header.
Authorization: Bearer your-api-key-here
The chat completions endpoint generates responses from language models based on a conversation history.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model to use. Use "auto" for intelligent routing, or specify a model like "gpt-4", "claude-3.5", "deepseek". |
| messages | array | Yes | List of messages in the conversation. Each message has "role" and "content". |
| temperature | float | No | Sampling temperature between 0 and 2. Default 1.0. |
| max_tokens | int | No | Maximum tokens to generate. |
| enable_cache | boolean | No | Enable semantic caching. Default true. |
response = client.chat.completions.create(
model="auto",
messages=[
{"role": "system", "content": "You are a technical expert."},
{"role": "user", "content": "Explain semantic caching in simple terms."}
],
temperature=0.7,
max_tokens=500,
enable_cache=True
)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Semantic caching is a technique that..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 120,
"total_tokens": 145,
"cost": 0.0012
},
"cache_hit": false
}
Test the API directly in your browser.
Hemicule automatically routes each request to the optimal model based on query complexity, domain, and performance requirements.
# Just set model="auto" — we handle the rest
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Your query here"}]
)
Hemicule caches responses for semantically similar queries using vector-based similarity detection. Repeated questions return from cache — no additional API cost.
Track every API call with detailed cost attribution and real-time metrics.
# Response includes detailed usage information
response = client.chat.completions.create(...)
print(f"Cost: ${response.usage.cost}")
print(f"Cache hit: {response.cache_hit}")
print(f"Model used: {response.model}")
from openai import OpenAI
client = OpenAI(
base_url="https://api.hemicule.com/v1",
api_key="your-key"
)
# Streaming support
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.hemicule.com/v1',
apiKey: 'your-key'
});
const response = await client.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Hello!' }]
});
| Model | Provider | Description |
|---|---|---|
| gpt-4 | OpenAI | Flagship model, best for complex reasoning |
| gpt-4o | OpenAI | Multimodal model |
| claude-3.5 | Anthropic | Long context, high security |
| gemini-pro | Multimodal understanding | |
| deepseek | DeepSeek | High cost-performance ratio |
| llama-3 | Meta | Open-source, customizable |
| Code | Description | Solution |
|---|---|---|
| 401 | Invalid API key | Check your API key and Authorization header |
| 429 | Rate limit exceeded | Reduce request frequency or upgrade your plan |
| 500 | Internal server error | Retry with exponential backoff |
Sign up and navigate to the API Keys section in your dashboard.
Hemicule adds intelligent routing, semantic caching, and unified observability — reducing costs by 50-70% with no code changes.
Yes. We support private deployment within your VPC, PII redaction, and SOC2 Type II certification.
Absolutely. We're fully compatible with OpenAI's SDK — just change the base URL and API key.
The free tier includes 100,000 calls per month — enough for small projects and testing.