Tech News

OpenAI teases new reasoning model—but don’t expect to try it soon

Published

2 days ago

Image: Alex Parkin / The Verge

For the last day of ship-mas, OpenAI previewed a new set of frontier “reasoning” models dubbed o3 and o3-mini. The Verge first reported that a new reasoning model would be coming during this event.

The company isn’t releasing these models today (and admits final results may evolve with more post-training). However, OpenAI is accepting applications from the research community to test these systems ahead of public release (which it has yet to set a date for). OpenAI launched o1 (codenamed Strawberry) in September and is jumping straight to o3, skipping o2 to avoid confusion (or trademark conflicts) with the British telecom company called O2.

The term reasoning has become a common buzzword in the AI industry lately, but it basically means the machine breaks down instructions into smaller tasks that can produce stronger outcomes. These models often show the work for how it got to an answer, rather than just giving a final answer without explanation.

According to the company, o3 surpasses previous performance records across the board. It beats its predecessor in coding tests (called SWE-Bench Verified) by 22.8 percent and outscores OpenAI’s Chief Scientist in competitive programming. The model nearly aced one of the hardest math competitions (called AIME 2024), missing one question, and achieved 87.7 percent on a benchmark for expert-level science problems (called GPQA Diamond). On the toughest math and reasoning challenges that usually stump AI, o3 solved 25.2 percent of problems (where no other model exceeds 2 percent).

OpenAI claims o3 performs better than its other reasoning models in coding benchmarks.

The company also announced new research on deliberative alignment, which requires the AI model to process safety decisions step-by-step. So, instead of just giving yes/no rules to the AI model, this paradigm requires it to actively reason about whether a user’s request fits OpenAI’s safety policies. The company claims that when it tested this on o1, it was much better at following safety guidelines than previous models, including GPT-4.

In this article:

Tech News

Flipboard’s Surf app is a big new idea about the future of social

Surf’s homepage is just feeds. It’s feeds all the way down. | Image: David Pierce / Flipboard Mike McCue, the CEO of Flipboard and...

5 days ago

Tech News

The New Jersey drone hysteria exposes one salient truth: no one knows anything

Is it a bird? Is it a plane? | Image: Getty Okay, I get it, we’re all sick of the drones. I went to...

5 days ago

Tech News

Why is every picture of the New Jersey drones so bad?

This is such a weird episode of The X-Files. | Image: Cath Virginia / The Verge, Getty Images Phone cameras are incredible little machines,...

4 days ago

Business

Trump transfers all his DJT shares to his revocable trust, SEC filings show

President-elect Donald Trump this week transferred his entire stake of shares in Trump Media to a revocable trust of which he is the sole beneficiary, regulatory filings revealed Thursday evening. Trump did...

2 days ago

BestMarketInsiders.com

Tech News

OpenAI teases new reasoning model—but don’t expect to try it soon

You May Also Like

Tech News

Flipboard’s Surf app is a big new idea about the future of social

Tech News

The New Jersey drone hysteria exposes one salient truth: no one knows anything

Tech News

Why is every picture of the New Jersey drones so bad?

Business

Trump transfers all his DJT shares to his revocable trust, SEC filings show