Kimi K2.5 API
OpenAI-compatible inference API powered by Moonshot AI's Kimi K2.5 —
a 1 trillion parameter Mixture-of-Experts model running on dedicated 8× NVIDIA H100 GPUs.
$0.45
Per 1M input tokens
Quick Start
from openai import OpenAI
client = OpenAI(
base_url="https://gpu-workspace.taile8dc37.ts.net/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024
)
print(response.choices[0].message.content)
Endpoints
POST
/v1/chat/completions
Chat completions (streaming supported)
GET
/v1/models
List available models
GET
/health
Server health check
GET
/docs
Full API documentation (Swagger)
Pricing
Input
$0.45 / 1M tokens
- Prompt & system messages
- No minimum commitment
- Pay per token used
Output
$2.50 / 1M tokens
- Generated responses
- Includes reasoning tokens
- Real-time spend tracking
Model Details
Architecture: Mixture-of-Experts (384 experts, 8 active per token)
Active params: ~32B per forward pass
Context window: 4,096 tokens (configurable up to 262K)
Quantization: Native INT4 (W4A16 QAT)
Features: Reasoning, tool/function calling, multi-turn chat
Compatibility: OpenAI API drop-in replacement
API Documentation
Admin Dashboard