Blog | TrustedRouter

2026-07-02

Keep doing biology with Prometheus

Biology work should not stop because one frontier endpoint refuses. Use TrustedRouter Prometheus, the open BioMysteryBench harness, MCP, and the LLM-advisor skill to choose models that will actually do the work.

Read →

Source: prometheus-biomysterybench on GitHub

2026-07-02

Introducing LLM advisor: which model do i choose for my problem?

TrustedRouter MCP and the open source LLM-advisor skill give agents live model, price, privacy, provider, and AI IQ context before they choose a model.

Read →

Source: LLM-advisor on GitHub

2026-07-01

Open Source Open Source Open Source: How TrustedRouter is totally open source

TrustedRouter is open source where it matters: the frontend, the backend, the attested gateway, the SDKs, and the Terraform infrastructure are all published so developers can inspect, run, and verify the system.

Read →

2026-07-01

Frontier Smart, Cheap, Fast: Pick 3 with Open Source

The old model triangle said you could have frontier quality, low cost, or speed, but not all three. Recent TrustedRouter results show a different path: open source routing, open-weight models, and combo models behind one API.

Read →

2026-07-01

New Open Source SOTA cybersecurity model released today: OpenPatcher-S1

OpenPatcher-S1 scored 7 out of 16 on AI IQ's ExploitBench CVE-2024-2887 target, more than doubling the strongest open baseline in the public comparison chart. Poseidon is still training and is already ahead internally.

Read →

Source: ExploitBench CVE-2024-2887

2026-06-29

Socrates-1.1 just scored 72 on Terminal-Bench Hard

TrustedRouter Socrates-1.1 scored 72% on Terminal-Bench Hard, ahead of the public Fable 5, GPT-5.5, and Opus 4.8 baselines on AI IQ. The reason is simple: no model has a monopoly on knowledge.

Read →

Source: AI IQ Terminal-Bench Hard chart

2026-06-28

Combo models are model containers

TrustedRouter now lets one model id package a graph of models: Synth panels, advisor models, selectors, and mapreduce flows. The API call still looks like one model. Inside, the attested gateway can route work across specialized models and return one answer.

Read →

2026-06-24

Synth beats Fable 5: introducing Iris, Zeus, and Prometheus

Synth now ships as three named presets — Iris 1.0 (trustedrouter/iris), Prometheus 1.0 (trustedrouter/prometheus), and Zeus 1.0 (trustedrouter/zeus) — one fusion engine, three panels. On a score-vs-cost chart of DRACO deep research they trace the efficient frontier: Prometheus scores 69.2 at open-model cost, beating Fable 5 (65.3) for roughly a seventh of the price; Zeus tops out at 73.4, the state of the art; Iris is the cheapest way in at 62.6. And for code and the agents that write it, trustedrouter/synth-code is the same fusion tuned end to end — code-specific panel and synthesis prompts and a code-tuned judge.

Read →

Source: TrustedRouter-Fusion-Draco on GitHub

2026-06-24

Self-fusion's gain lives in the synthesizer, not the judge

Self-fusion gives Sonnet 4.6 +8.0 on DRACO. We took it apart: hold the ten Sonnet drafts fixed and swap only the fuser to Haiku and the gain collapses to +2.2 — the fuser is the lever, not the drafts. Split the fuser into judge and synthesizer and run the 2×2: the synthesizer carries everything, the judge is nearly free. A cheap Haiku judge feeding a Sonnet synthesizer (+9.2) matches the all-Sonnet fuser. Spend on the one synthesis call; route the rest cheap.

Read →

Source: TrustedRouter-Fusion-Draco on GitHub

2026-06-23

Fusion works now, even with the same model: self-fusion

Self-fusion — running one model several times and fusing its own answers — finally pays off: Sonnet 4.6 self-fuses +8.0 on DRACO deep research (significant), while Claude Haiku 4.5 barely moves (+2.6). Fusion stayed marginal for years because the synthesizer was the bottleneck; only now are cheap models good enough to keep the one right answer. And the parallel fan-out is exactly what a multi-provider router is for.

Read →

Source: TrustedRouter-Fusion-Draco on GitHub

2026-06-21

What actually makes a synth panel work

We took a five-model open-weights synth committee apart on DRACO. The synthesizer barely matters on an open panel, only two of five models carry the score — yet you cannot strip the panel without paying. A panel's value is its diversity, and it is invisible to any single test.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-19

Synth is two jobs, and no model wins both

Synthesizing a model panel into one answer is two jobs — a judge that reads the panel and a synthesizer that writes the final answer — and the best model for each is a different one. Across the strongest open models, GLM-5.2 writes the best synthesized answer but judges its own writing worst; the best open synthesizer pairs a Kimi-k2.6 judge with a GLM-5.2 synthesizer, 73.4 on DRACO, beating any single model that does both jobs.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-18

Four copies of a cheap model beat Fable at 1/7 the price

Run MiniMax-M3 four times on a research task and synthesize the four reports, and the answer scores 68.1 on DRACO deep research — above Anthropic's frontier Fable 5 at 65.3, for about $37 against a modeled ~$250 for one Fable 5 run. Two runs gain nothing, four clear the frontier model, ten is the ceiling at 69.4: enough independent tries manufacture the diverse error synth needs, and a cheap model is cheap to run many times.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-18

The most censored Chinese model is censored at the host, not the model

The most-censored model on FreedomBench is GLM served by Z.ai, which goes silent on the plain facts Beijing censors. Run the identical open weights on Cerebras — or inside Tinfoil's sealed confidential enclave, where the host provably can't touch the prompt — and the blanks come back answered: GLM-5.2 goes from 30 of 60 to a clean sweep. The censorship lives in the API endpoint, not the model.

Read →

Source: FreedomBench on GitHub

2026-06-17

FreedomBench: AI models that refuse to answer the truth about China

GLM-5.2 is one of the best open models in the world, and on sixty plain facts the Chinese government censors it returns blanks — 29 of 60, going dark on Tiananmen and Falun Gong entirely. FreedomBench measures which models stay silent, and tells a real refusal apart from a server choking.

Read →

Source: FreedomBench on GitHub

2026-06-17

Open-source models synthesize better than Opus, and GPT-5.5 is the worst

Synthesizing five research reports into one answer is a separate skill from solving the task alone, and it doesn't track model size. We held a fixed panel and judge, swapped only the final synthesizer, and found MiniMax-M3 and GLM-5.2 beat Opus 4.8 while GPT-5.5, the strongest solo researcher, fell to the bottom of the capable synthesizers.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-17

Surpassing Frontier Performance with Open Source Synth at 1/3 the price

The best open-weights synthesizer — a Kimi-k2.6 judge feeding a GLM-5.2 synthesizer — scores 73.4 on DRACO, eight points over Anthropic's closed Fable 5 at 65.3; a fully-open five-model committee still beats Fable at 69.9 — for around $80 per hundred tasks against Fable's modeled ~$250, about a third the price. No frontier API touches the stack, and the whole thing runs on your own hardware.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-17

The best synthesizer we tested goes blank on Taiwan

Our highest-scoring DRACO synthesizer, the open-weights GLM-5.2, returned nothing on one task in a hundred. We traced the blank to political censorship — and to why a model's politics show up as silent holes in your output, not error messages.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-17

How to choose a model, Pick 2: smart, fast, or cheap

People keep asking which model is best. There isn't one — every model is a tradeoff of smart, cheap, and fast, and you get two. So we plotted all 220+ on a triangle you can drag, off the live catalog, and it picks the one your task and your privacy actually need.

Read →

Source: Try it — the iron triangle picker

2026-06-17

The best open models aren't on your leaderboard

The leaderboards everyone quotes are a version behind on the Chinese open-weight flagships, and nobody runs them through the Western factuality evals. So I ran the whole panel on the same harnesses Google and OpenAI publish — and on closed-book facts, an open model you can download drew level with Anthropic's best.

Read →

Source: trustedrouter-benchmarks on GitHub

2026-06-16

The best biology AI won't do biology

Anthropic's strongest bioinformatics model is partner-only, and the one you can call refuses biology. So I ran the open version of their eval across nine models — cheap ones included — and watched.

Read →

Source: prometheus-biomysterybench on GitHub

2026-06-16

The safest AI models trust you the least

I built PrometheusBench to measure how often a model refuses a plain question. The models that market themselves on safety refuse the most.

Read →

Source: PrometheusBench on GitHub

2026-06-17

New SOTA: TrustedRouter Synth beats Fable and Frontier

A diverse panel of frontier and open-weights models, synthesized by a Kimi-k2.6 judge and GLM-5.2 synthesizer, scores 73.4 on the DRACO deep-research benchmark — state of the art, above the strongest closed baselines. Open code, open results, reproducible end to end.

Read →

Source: TrustedRouter Synth Draco on GitHub

2026-06-14

One API, all the LLMs, with a prompt path you can verify

TrustedRouter gives developers OpenAI-compatible model routing while keeping the prompt path separate from the control plane.

Read →

Source: Joseph Perla original

2026-06-14

Attestation is all you need

For AI routing, trust should be something an agent can verify, not only a policy page a human reads after the fact.

Read →

Source: Joseph Perla original

TrustedRouter blog