Kimi K2.5 API

OpenAI-compatible inference API powered by Moonshot AI's Kimi K2.5 — a 1 trillion parameter Mixture-of-Experts model running on dedicated 8× NVIDIA H100 GPUs.

1.03T

Parameters

110

Tokens / sec

8× H100

GPU Cluster

$0.45

Per 1M input tokens

Quick Start

        from openai import OpenAI

        client = OpenAI(

            base_url="https://gpu-workspace.taile8dc37.ts.net/v1",

            api_key="your-api-key"

        )

        response = client.chat.completions.create(

            model="kimi-k2.5",

            messages=[{"role": "user", "content": "Hello!"}],

            max_tokens=1024

        )

        print(response.choices[0].message.content)

Endpoints

POST /v1/chat/completions Chat completions (streaming supported)

GET /v1/models List available models

GET /health Server health check

GET /docs Full API documentation (Swagger)

Pricing

Input

$0.45 / 1M tokens

Prompt & system messages
No minimum commitment
Pay per token used

Output

$2.50 / 1M tokens

Generated responses
Includes reasoning tokens
Real-time spend tracking

Model Details

        Architecture: Mixture-of-Experts (384 experts, 8 active per token)

        Active params: ~32B per forward pass

        Context window: 4,096 tokens (configurable up to 262K)

        Quantization: Native INT4 (W4A16 QAT)

        Features: Reasoning, tool/function calling, multi-turn chat

        Compatibility: OpenAI API drop-in replacement

API Documentation Admin Dashboard