OpenAI compatible API. Attested gateway. Public status.
Meta: Llama 3.3 70B Instruct Benchmarks
Benchmark and measurement links for Meta: Llama 3.3 70B Instruct, with TrustedRouter route data first.
1 URLbase_url migration
100smodels and routes
0prompt logs by default
meta-llama/llama-3.3-70b-instruct
open weights
Benchmarks
Published benchmark scores
Benchmark scores for Meta: Llama 3.3 70B Instruct — every row links to its source, and a score is only ever attached to the exact checkpoint it was measured on. Vendor model-card and open-leaderboard numbers are cited, not run by us. Rows marked TrustedRouter · replays published are our own runs of this model through the gateway, with the full per-item replay published in trustedrouter-benchmarks so anyone can re-grade them.
| Benchmark | Category | Score | Source |
|---|---|---|---|
| HumanEval | Coding | 88.4% | Meta — Llama 3.3 70B model card 2024-12-06 |
| IFEval | Instruction following | 92.1% | Meta — Llama 3.3 70B model card 2024-12-06 |
| MMLU 0-shot, CoT |
Knowledge | 86.0% | Meta — Llama 3.3 70B model card 2024-12-06 |
| MMLU-Pro CoT |
Knowledge | 68.9% | Meta — Llama 3.3 70B model card 2024-12-06 |
| MATH 0-shot, CoT |
Math | 77.0% | Meta — Llama 3.3 70B model card 2024-12-06 |
| GPQA Diamond 0-shot, CoT |
Science | 50.5% | Meta — Llama 3.3 70B model card 2024-12-06 |
TrustedRouter measurements
TrustedRouter publishes route and status measurements without storing prompt or output content. Provider latency and uptime are exposed through the model performance and uptime pages.
External benchmark references
- TrustedRouter performance pageTrustedRouter measurement
- TrustedRouter uptime pageTrustedRouter measurement
- LMArena leaderboardIndependent benchmark index
- LiveBenchIndependent benchmark index
- Artificial Analysis modelsIndependent benchmark index
- HELMIndependent benchmark index