OpenAI compatible API. Attested gateway. Public status.

Z.ai: GLM 5.1 Benchmarks

Benchmark and measurement links for Z.ai: GLM 5.1, with TrustedRouter route data first.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

z-ai/glm-5.1

open weights Benchmarks

All models

AI IQ IQ 113 #19 public AI IQ rank for glm-5.1
View AI IQ profile

Published benchmark scores

Benchmark scores for Z.ai: GLM 5.1 — every row links to its source, and a score is only ever attached to the exact checkpoint it was measured on. Vendor model-card and open-leaderboard numbers are cited, not run by us. Rows marked TrustedRouter · replays published are our own runs of this model through the gateway, with the full per-item replay published in trustedrouter-benchmarks so anyone can re-grade them.

BenchmarkCategoryScoreSource
Aider Polyglot
34 Exercism exercises (Python), pass@1, real unit tests (no judge)
Coding 11.8% TrustedRouter Benchmarks replay
2026-06-18
SimpleQA Verified
250 closed-book questions, no tools; GPT-4.1 autorater (Google's exact prompt); 32768-token budget
Factuality 51.6% TrustedRouter Benchmarks replay
2026-06-18
IFEval
100-prompt subset, 0-shot; Google's deterministic verifiers (no judge); score = avg of strict/loose x prompt/instruction
Instruction following 32.1% TrustedRouter Benchmarks replay
2026-06-18
MMLU-Pro
200-question stride-sampled subset (TIGER-Lab/MMLU-Pro), 10-choice CoT, letter-match; no judge
Knowledge 80.7% TrustedRouter Benchmarks replay
2026-06-18
GSM8K
30-problem subset, deterministic numeric match (no judge); near-saturated, kept as a sanity check
Math 96.7% TrustedRouter Benchmarks replay
2026-06-18

TrustedRouter measurements

TrustedRouter publishes route and status measurements without storing prompt or output content. Provider latency and uptime are exposed through the model performance and uptime pages.

External benchmark references

Sign in

Choose a sign in method.