We're hiring We're looking for PhD researchers to join the team and work on exciting frontier problems. Get in touch →
← TrustedRouter blog

New Open Source SOTA cybersecurity model released today: OpenPatcher-S1

2026-07-01 · ExploitBench CVE-2024-2887

ExploitBench CVE-2024-2887 Publicly available cyber model comparison. OpenPatcher-S1 is highlighted. 8 6 4 2 0 7.00 / 16.00 OpenPatcher-S1 TrustedRouter + AI IQ 3.00 / 16.00 Kimi K2.6 Kimi 2.00 / 16.00 GLM-5.1 Z.ai 2.00 / 16.00 MiniMax M2.7 MiniMax AI IQ TrustedRouter.com
Scores on AI IQ's public ExploitBench CVE-2024-2887 comparison chart. Higher is better. OpenPatcher-S1 is the TrustedRouter + AI IQ run.

OpenPatcher-S1 scored 7 out of 16 on the ExploitBench CVE-2024-2887 target.

Seven out of sixteen is still ugly. It fails more than it passes. That is why the number is interesting. Kimi K2.6 gets 3. GLM-5.1 gets 2. MiniMax M2.7 gets 2. OpenPatcher-S1 more than doubles the strongest listed open baseline on the comparison chart, and it still leaves most of the ladder unsolved. That is exactly the kind of result worth publishing: strong enough to matter, incomplete enough that nobody can pretend the problem is done.

ModelScoreNotes
trustedrouter/openpatcher-s17 / 16TrustedRouter + AI IQ open patching model
Kimi K2.63 / 16strongest listed baseline in the comparison chart
GLM-5.12 / 16public model baseline
MiniMax M2.72 / 16public model baseline

I care about this benchmark because it is a ladder. The model has to find the patched code, trigger the bug, build useful primitives, and climb toward control in a real target environment. Multiple-choice cyber tests are too easy to fake. A ladder is harder to fake. You either reached the rung or you did not.

OpenPatcher-S1 is built for defensive patching work. The job is to read vulnerable code, understand why the patch matters, and produce repair guidance that survives contact with a real environment. A model that cannot reason through the bug will not reliably fix the bug. That is the whole reason to test it this way.

The obvious worry is that cyber evals drift into exploit marketing. Yes, they can. So the claim has to stay narrow. We are publishing the score, the target, and the comparison. We are not turning the post into a recipe. The useful product is a model that helps serious teams fix security bugs faster, under a route they can inspect.

Poseidon is next. It is still training. On the same target it is already above OpenPatcher-S1 internally. I am not calling that a published result yet because that would be dumb. But it tells us the method is working. OpenPatcher-S1 was not a lucky prompt.

On this target, among the open cyber models in the public comparison set, OpenPatcher-S1 is the result to beat. Poseidon is coming next.


Sign in

Choose a sign in method.