Docs_Hemicule

Introduction

Hemicule provides a unified API interface to 20+ language models including OpenAI GPT-4, Anthropic Claude, Google Gemini, and open-source models. Our gateway adds intelligent routing, semantic caching, and full observability without changing your existing code.

✨ Fully Compatible with OpenAI SDK
Use any existing OpenAI library — just update the base URL to https://api.hemicule.com/v1 and use your Hemicule API key.

Quick Start

Get your first API call working in under 5 minutes.

Step 1: Get your API key

Step 2: Install SDK

Terminal

pip install openai

Step 3: Make your first call

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.hemicule.com/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="auto",  # 'auto' enables intelligent routing
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an API gateway?"}
    ]
)

print(response.choices[0].message.content)

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header.

HTTP Header

Authorization: Bearer your-api-key-here

⚠️ Keep your API key secure
Never expose your API key in client-side code or public repositories. Use environment variables or secure secret management.

Chat Completions

The chat completions endpoint generates responses from language models based on a conversation history.

POST https://api.hemicule.com/v1/chat/completions

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	Model to use. Use "auto" for intelligent routing, or specify a model like "gpt-4", "claude-3.5", "deepseek".
messages	array	Yes	List of messages in the conversation. Each message has "role" and "content".
temperature	float	No	Sampling temperature between 0 and 2. Default 1.0.
max_tokens	int	No	Maximum tokens to generate.
enable_cache	boolean	No	Enable semantic caching. Default true.

Example Request

Python

response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a technical expert."},
        {"role": "user", "content": "Explain semantic caching in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500,
    enable_cache=True
)

Example Response

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Semantic caching is a technique that..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 120,
    "total_tokens": 145,
    "cost": 0.0012
  },
  "cache_hit": false
}

Interactive Playground

Test the API directly in your browser.

🔧 API Playground

Model

Messages (JSON format)

Response will appear here...

Intelligent Routing

Hemicule automatically routes each request to the optimal model based on query complexity, domain, and performance requirements.

💡 How routing works
Simple queries (classification, extraction) → Cost-effective models (DeepSeek, GPT-3.5)
Complex reasoning (analysis, code generation) → Premium models (GPT-4, Claude)
Delivers 40% cost reduction with no quality loss.

Using auto routing

Python

# Just set model="auto" — we handle the rest
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Your query here"}]
)

Semantic Caching

Hemicule caches responses for semantically similar queries using vector-based similarity detection. Repeated questions return from cache — no additional API cost.

⚡ Cache benefits
• 60%+ cache hit rate in customer support scenarios
• Response latency drops from seconds to milliseconds
• 70% cost reduction on repetitive queries

Observability

Track every API call with detailed cost attribution and real-time metrics.

Python - Get usage stats

# Response includes detailed usage information
response = client.chat.completions.create(...)
print(f"Cost: ${response.usage.cost}")
print(f"Cache hit: {response.cache_hit}")
print(f"Model used: {response.model}")

Python SDK

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.hemicule.com/v1",
    api_key="your-key"
)

# Streaming support
stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Node.js SDK

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'https://api.hemicule.com/v1',
    apiKey: 'your-key'
});

const response = await client.chat.completions.create({
    model: 'auto',
    messages: [{ role: 'user', content: 'Hello!' }]
});

Supported Models

Model	Provider	Description
gpt-4	OpenAI	Flagship model, best for complex reasoning
gpt-4o	OpenAI	Multimodal model
claude-3.5	Anthropic	Long context, high security
gemini-pro	Google	Multimodal understanding
deepseek	DeepSeek	High cost-performance ratio
llama-3	Meta	Open-source, customizable

Error Codes

Code	Description	Solution
401	Invalid API key	Check your API key and Authorization header
429	Rate limit exceeded	Reduce request frequency or upgrade your plan
500	Internal server error	Retry with exponential backoff

Frequently Asked Questions

How do I get an API key?

What's the difference between Hemicule and direct API calls?

Hemicule adds intelligent routing, semantic caching, and unified observability — reducing costs by 50-70% with no code changes.

Is my data secure?

Yes. We support private deployment within your VPC, PII redaction, and SOC2 Type II certification.

Can I use my existing OpenAI code?

Absolutely. We're fully compatible with OpenAI's SDK — just change the base URL and API key.

What are the free tier limits?

The free tier includes 100,000 calls per month — enough for small projects and testing.