Docs

Complete API reference for Hemicule LLM Gateway — fully compatible with OpenAI SDK

Introduction

Hemicule provides a unified API interface to 20+ language models including OpenAI GPT-4, Anthropic Claude, Google Gemini, and open-source models. Our gateway adds intelligent routing, semantic caching, and full observability without changing your existing code.

✨ Fully Compatible with OpenAI SDK
Use any existing OpenAI library — just update the base URL to https://api.hemicule.com/v1 and use your Hemicule API key.

Quick Start

Get your first API call working in under 5 minutes.

Step 1: Get your API key

Sign up and copy your API key from the dashboard.

Step 2: Install SDK

Terminal
pip install openai

Step 3: Make your first call

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hemicule.com/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="auto",  # 'auto' enables intelligent routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an API gateway?"}
    ]
)

print(response.choices[0].message.content)

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header.

HTTP Header
Authorization: Bearer your-api-key-here
⚠️ Keep your API key secure
Never expose your API key in client-side code or public repositories. Use environment variables or secure secret management.

Chat Completions

The chat completions endpoint generates responses from language models based on a conversation history.

POST https://api.hemicule.com/v1/chat/completions

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel to use. Use "auto" for intelligent routing, or specify a model like "gpt-4", "claude-3.5", "deepseek".
messagesarrayYesList of messages in the conversation. Each message has "role" and "content".
temperaturefloatNoSampling temperature between 0 and 2. Default 1.0.
max_tokensintNoMaximum tokens to generate.
enable_cachebooleanNoEnable semantic caching. Default true.

Example Request

Python
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a technical expert."},
        {"role": "user", "content": "Explain semantic caching in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500,
    enable_cache=True
)

Example Response

JSON
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Semantic caching is a technique that..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 120,
    "total_tokens": 145,
    "cost": 0.0012
  },
  "cache_hit": false
}

Interactive Playground

Test the API directly in your browser.

🔧 API Playground
Response will appear here...

Intelligent Routing

Hemicule automatically routes each request to the optimal model based on query complexity, domain, and performance requirements.

💡 How routing works
Simple queries (classification, extraction) → Cost-effective models (DeepSeek, GPT-3.5)
Complex reasoning (analysis, code generation) → Premium models (GPT-4, Claude)
Delivers 40% cost reduction with no quality loss.

Using auto routing

Python
# Just set model="auto" — we handle the rest
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Your query here"}]
)

Semantic Caching

Hemicule caches responses for semantically similar queries using vector-based similarity detection. Repeated questions return from cache — no additional API cost.

⚡ Cache benefits
• 60%+ cache hit rate in customer support scenarios
• Response latency drops from seconds to milliseconds
• 70% cost reduction on repetitive queries

Observability

Track every API call with detailed cost attribution and real-time metrics.

Python - Get usage stats
# Response includes detailed usage information
response = client.chat.completions.create(...)
print(f"Cost: ${response.usage.cost}")
print(f"Cache hit: {response.cache_hit}")
print(f"Model used: {response.model}")

Python SDK

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.hemicule.com/v1",
    api_key="your-key"
)

# Streaming support
stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Node.js SDK

JavaScript
import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'https://api.hemicule.com/v1',
    apiKey: 'your-key'
});

const response = await client.chat.completions.create({
    model: 'auto',
    messages: [{ role: 'user', content: 'Hello!' }]
});

Supported Models

ModelProviderDescription
gpt-4OpenAIFlagship model, best for complex reasoning
gpt-4oOpenAIMultimodal model
claude-3.5AnthropicLong context, high security
gemini-proGoogleMultimodal understanding
deepseekDeepSeekHigh cost-performance ratio
llama-3MetaOpen-source, customizable

Error Codes

CodeDescriptionSolution
401Invalid API keyCheck your API key and Authorization header
429Rate limit exceededReduce request frequency or upgrade your plan
500Internal server errorRetry with exponential backoff

Frequently Asked Questions

How do I get an API key?

Sign up and navigate to the API Keys section in your dashboard.

What's the difference between Hemicule and direct API calls?

Hemicule adds intelligent routing, semantic caching, and unified observability — reducing costs by 50-70% with no code changes.

Is my data secure?

Yes. We support private deployment within your VPC, PII redaction, and SOC2 Type II certification.

Can I use my existing OpenAI code?

Absolutely. We're fully compatible with OpenAI's SDK — just change the base URL and API key.

What are the free tier limits?

The free tier includes 100,000 calls per month — enough for small projects and testing.