Kimi K2.5 API

OpenAI-compatible inference API powered by Moonshot AI's Kimi K2.5 — a 1 trillion parameter Mixture-of-Experts model running on dedicated 8× NVIDIA H100 GPUs.

1.03T
Parameters
110
Tokens / sec
8× H100
GPU Cluster
$0.45
Per 1M input tokens

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://gpu-workspace.taile8dc37.ts.net/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024
)

print(response.choices[0].message.content)

Endpoints

POST /v1/chat/completions Chat completions (streaming supported)
GET /v1/models List available models
GET /health Server health check
GET /docs Full API documentation (Swagger)

Pricing

Input

$0.45 / 1M tokens
  • Prompt & system messages
  • No minimum commitment
  • Pay per token used

Output

$2.50 / 1M tokens
  • Generated responses
  • Includes reasoning tokens
  • Real-time spend tracking

Model Details

Architecture: Mixture-of-Experts (384 experts, 8 active per token)
Active params: ~32B per forward pass
Context window: 4,096 tokens (configurable up to 262K)
Quantization: Native INT4 (W4A16 QAT)
Features: Reasoning, tool/function calling, multi-turn chat
Compatibility: OpenAI API drop-in replacement
API Documentation Admin Dashboard