o1's new CBRN report looks (a fair bit) better
OpenAI released a new ChemBio safety card for o1. It fixes a lot of my original issues (but not all of them).
[This is a slightly edited blog post version of my tweet thread].
A few weeks ago, I “peer-reviewed” o1-preview's ChemBio safety card and highlighted some issues about its methodology.
Now that o1 is out, how does it stack up?
Better! (Though there’s still room for improvement.)
Here’s my new o1 scorecard. 🧵👇
Credit where it’s due.
The new system card improved on the old one:
More comparisons to PhD baselines (now exist for 3/5 evals vs. 0/3 before)
Multiple-choice tests converted to open-ended, making them more realistic
Clear acknowledgment these results are "lower bounds"
Some things could still be improved:
o1 underperformed PhDs at *one* lab-skill eval (out of 5!) and it's not clear how that test was scored
OAI says tinkering could boost scores, but not by how much (other orgs try to forecast this)
Results are from a "near-final" o1 version. Some people note that the final version that got released likely does better
Some critical points:
Previously, I flagged o1-previews’ 69% score on the Gryphon eval might match PhDs.
Turns out, experts score 57%—so OAI passed this eval *months* ago. I hope they declares such results in the future.
(I'd keep an eye on the multimodal eval with no PhD score yet)
Big picture:
AIs keep saturating dangerous capability tests. With o1 we “ratcheted up” from multiple-choice to open-ended evals. But that won’t hold for long.
We need harder evals—ones where if an AI succeeds that suggests a real risk. (No updates yet on OAI’s wet-lab study).
My verdict:
1 test suggests the "lower bound" lacks wet-lab skills; 4 can't rule it out. It's plausible o1 was ~fine to deploy, but all remains subjective.
The report being clearer and more nuanced helps build trust. The next one should go further—and include harder evaluations.
Call to action
Want to improve the “science of evals” and make dangerous capability tests more realistic? Tell us your ideas!
Open Philanthropy has supported many tests that OAI and others now use—including work by people who are skeptical of AGI and AI risks.
Better evidence = better decisions