OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway

1 URLbase_url migration

100smodels and routes

0prompt logs by default

`nebius`

No logs

All providers

Provider	Nebius Token Factory
Models	20 public models
Prepaid routes	18
BYOK routes	20
Zero data retention	yes
Confidential compute	not claimed
Provider E2EE	not claimed
Policy note	Marked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data. Policy source

Measured performance

192 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT	5090 ms
Throughput	—
Uptime	97.92%

Model	p50 TTFT	p50 TTFB	Throughput	Uptime	Config excluded	Samples
Qwen/Qwen2.5-VL-72B-Instruct	2445 ms	2444 ms	—	100.00%	—	10
openai/gpt-oss-120b	3275 ms	3274 ms	—	100.00%	—	19
deepseek-ai/DeepSeek-V4-Pro	4406 ms	4405 ms	—	86.67%	—	15
Qwen/Qwen3-Next-80B-A3B-Thinking	4510 ms	4509 ms	—	100.00%	—	8
NousResearch/Hermes-4-405B	4789 ms	4789 ms	—	100.00%	—	14
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1	4927 ms	4927 ms	—	100.00%	—	11
Qwen/Qwen3-235B-A22B-Instruct-2507	4976 ms	4975 ms	—	100.00%	—	17
meta-llama/Llama-3.3-70B-Instruct	5090 ms	5090 ms	—	93.33%	—	15
Qwen/Qwen3-32B	7460 ms	7460 ms	—	100.00%	—	18
google/gemma-3-27b-it	7765 ms	7764 ms	—	100.00%	—	16
Qwen/Qwen3-30B-A3B-Instruct-2507	9691 ms	9690 ms	—	100.00%	—	15
NousResearch/Hermes-4-70B	9895 ms	9894 ms	—	91.67%	—	12
zai-org/GLM-5.1	10207 ms	10207 ms	—	100.00%	—	13
nvidia/nemotron-3-super-120b-a12b	10227 ms	10226 ms	—	100.00%	—	9

Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model	AI IQ	Context	Endpoints	Prompt	Completion	Routes
`MiniMaxAI/MiniMax-M2.5` MiniMax M2.5 benchmarks performance api	IQ 103#43	204,800	2	$0.33/1M	$1.32/1M	prepaid BYOK
`NousResearch/Hermes-4-405B` Hermes 4 405B benchmarks performance api	—	131,072	2	$1.1/1M	$3.3/1M	prepaid BYOK
`NousResearch/Hermes-4-70B` Hermes 4 70B benchmarks performance api	—	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`Qwen/Qwen2.5-VL-72B-Instruct` Qwen2.5 VL 72B Instruct benchmarks performance api	—	32,768	2	$0.22/1M	$0.77/1M	prepaid BYOK
`Qwen/Qwen3-235B-A22B-Instruct-2507` Qwen3 235B A22B Instruct 2507 benchmarks performance api	—	131,072	2	$0.22/1M	$0.66/1M	prepaid BYOK
`Qwen/Qwen3-30B-A3B-Instruct-2507` Qwen3 30B A3B Instruct 2507 benchmarks performance api	—	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-32B` Qwen3 32B benchmarks performance api	—	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-Next-80B-A3B-Thinking` Qwen3 Next 80B A3B Thinking benchmarks performance api	—	131,072	2	$0.165/1M	$1.65/1M	prepaid BYOK
`Qwen/Qwen3.5-397B-A17B` Qwen3.5 397B A17B benchmarks performance api	—	262,144	2	$0.66/1M	$3.96/1M	prepaid BYOK
`deepseek-ai/DeepSeek-V4-Pro` DeepSeek V4 Pro benchmarks performance api	IQ 109#28	1,048,576	2	$1.859/1M	$3.718/1M	prepaid BYOK
`google/gemma-2-2b-it` gemma 2 2b it benchmarks performance api	—	8,192	1	$0.022/1M	$0.066/1M	BYOK
`google/gemma-3-27b-it` Google: Gemma 3 27B benchmarks performance api	—	131,072	2	$0.1309/1M	$0.22/1M	prepaid BYOK
`meta-llama/Llama-3.3-70B-Instruct` Llama 3.3 70B Instruct benchmarks performance api	—	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`meta-llama/Meta-Llama-3.1-8B-Instruct` Meta Llama 3.1 8B Instruct benchmarks performance api	—	128,000	1	$0.022/1M	$0.066/1M	BYOK
`nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` Llama 3_1 Nemotron Ultra 253B v1 benchmarks performance api	—	128,000	2	$0.66/1M	$1.98/1M	prepaid BYOK
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B` NVIDIA Nemotron 3 Nano 30B A3B benchmarks performance api	—	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`nvidia/Nemotron-3-Nano-Omni` Nemotron 3 Nano Omni benchmarks performance api	—	131,072	2	$0.165/1M	$0.495/1M	prepaid BYOK
`nvidia/nemotron-3-super-120b-a12b` nemotron 3 super 120b a12b benchmarks performance api	—	131,072	2	$0.66/1M	$1.98/1M	prepaid BYOK
`openai/gpt-oss-120b` OpenAI: gpt-oss-120b benchmarks performance api	IQ 95#59	131,072	2	$0.165/1M	$0.66/1M	prepaid BYOK
`zai-org/GLM-5.1` GLM 5.1 benchmarks performance api	IQ 113#19	204,800	2	$1.54/1M	$4.84/1M	prepaid BYOK