OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

nebius

No logs

All providers

ProviderNebius Token Factory
Models20 public models
Prepaid routes18
BYOK routes20
Zero data retentionyes
Confidential computenot claimed
Provider E2EEnot claimed
Policy noteMarked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data.
Policy source

Measured performance

192 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT5090 ms
Throughput
Uptime97.92%
Modelp50 TTFTp50 TTFBThroughputUptimeConfig excludedSamples
Qwen/Qwen2.5-VL-72B-Instruct 2445 ms 2444 ms 100.00% 10
openai/gpt-oss-120b 3275 ms 3274 ms 100.00% 19
deepseek-ai/DeepSeek-V4-Pro 4406 ms 4405 ms 86.67% 15
Qwen/Qwen3-Next-80B-A3B-Thinking 4510 ms 4509 ms 100.00% 8
NousResearch/Hermes-4-405B 4789 ms 4789 ms 100.00% 14
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 4927 ms 4927 ms 100.00% 11
Qwen/Qwen3-235B-A22B-Instruct-2507 4976 ms 4975 ms 100.00% 17
meta-llama/Llama-3.3-70B-Instruct 5090 ms 5090 ms 93.33% 15
Qwen/Qwen3-32B 7460 ms 7460 ms 100.00% 18
google/gemma-3-27b-it 7765 ms 7764 ms 100.00% 16
Qwen/Qwen3-30B-A3B-Instruct-2507 9691 ms 9690 ms 100.00% 15
NousResearch/Hermes-4-70B 9895 ms 9894 ms 91.67% 12
zai-org/GLM-5.1 10207 ms 10207 ms 100.00% 13
nvidia/nemotron-3-super-120b-a12b 10227 ms 10226 ms 100.00% 9

Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model AI IQ Context Endpoints Prompt Completion Routes
MiniMaxAI/MiniMax-M2.5
MiniMax M2.5
IQ 103#43 204,800 2 $0.33/1M $1.32/1M prepaid BYOK
NousResearch/Hermes-4-405B
Hermes 4 405B
131,072 2 $1.1/1M $3.3/1M prepaid BYOK
NousResearch/Hermes-4-70B
Hermes 4 70B
131,072 2 $0.143/1M $0.44/1M prepaid BYOK
Qwen/Qwen2.5-VL-72B-Instruct
Qwen2.5 VL 72B Instruct
32,768 2 $0.22/1M $0.77/1M prepaid BYOK
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen3 235B A22B Instruct 2507
131,072 2 $0.22/1M $0.66/1M prepaid BYOK
Qwen/Qwen3-30B-A3B-Instruct-2507
Qwen3 30B A3B Instruct 2507
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
Qwen/Qwen3-32B
Qwen3 32B
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
Qwen/Qwen3-Next-80B-A3B-Thinking
Qwen3 Next 80B A3B Thinking
131,072 2 $0.165/1M $1.65/1M prepaid BYOK
Qwen/Qwen3.5-397B-A17B
Qwen3.5 397B A17B
262,144 2 $0.66/1M $3.96/1M prepaid BYOK
deepseek-ai/DeepSeek-V4-Pro
DeepSeek V4 Pro
IQ 109#28 1,048,576 2 $1.859/1M $3.718/1M prepaid BYOK
google/gemma-2-2b-it
gemma 2 2b it
8,192 1 $0.022/1M $0.066/1M BYOK
google/gemma-3-27b-it
Google: Gemma 3 27B
131,072 2 $0.1309/1M $0.22/1M prepaid BYOK
meta-llama/Llama-3.3-70B-Instruct
Llama 3.3 70B Instruct
131,072 2 $0.143/1M $0.44/1M prepaid BYOK
meta-llama/Meta-Llama-3.1-8B-Instruct
Meta Llama 3.1 8B Instruct
128,000 1 $0.022/1M $0.066/1M BYOK
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
Llama 3_1 Nemotron Ultra 253B v1
128,000 2 $0.66/1M $1.98/1M prepaid BYOK
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
NVIDIA Nemotron 3 Nano 30B A3B
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
nvidia/Nemotron-3-Nano-Omni
Nemotron 3 Nano Omni
131,072 2 $0.165/1M $0.495/1M prepaid BYOK
nvidia/nemotron-3-super-120b-a12b
nemotron 3 super 120b a12b
131,072 2 $0.66/1M $1.98/1M prepaid BYOK
openai/gpt-oss-120b
OpenAI: gpt-oss-120b
IQ 95#59 131,072 2 $0.165/1M $0.66/1M prepaid BYOK
zai-org/GLM-5.1
GLM 5.1
IQ 113#19 204,800 2 $1.54/1M $4.84/1M prepaid BYOK

Sign in

Choose a sign in method.