Connect with us

Tech News

Did xAI lie about Grok 3’s benchmarks?

Published

1 year ago

on

February 23, 2025

By

Empowerment

The debate surrounding AI benchmarks and their reporting by AI labs is becoming more public. This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of sharing misleading benchmark results for its latest AI model, Grok 3. xAI co-founder Igor Babushkin defended the company’s actions, leading to differing opinions on the matter.

xAI posted a graph on their blog showcasing Grok 3’s performance on AIME 2025, a set of challenging math questions. While some experts question the validity of AIME as an AI benchmark, it is commonly used to assess a model’s math capabilities. The graph displayed Grok 3 variants outperforming OpenAI’s best model, o3-mini-high, on AIME 2025. However, OpenAI employees noted that the graph did not include o3-mini-high’s score at “cons@64,” which could alter the comparison.

A closer look reveals that Grok 3 Reasoning Beta and Grok 3 mini Reasoning scored below o3-mini-high on AIME 2025 at “@1.” Despite this, xAI promotes Grok 3 as the “world’s smartest AI.” The debate escalated, with Babushkin accusing OpenAI of similar misleading practices in the past.

A neutral party created a more “accurate” graph displaying various models’ performances at cons@64, shedding light on the nuances of the benchmark comparisons. AI researcher Nathan Lambert emphasized the importance of considering the computational and monetary costs associated with achieving the best scores, highlighting the limitations and strengths of AI models that benchmarks often fail to convey.

See also xAI blames Grok's obsession with white genocide on an 'unauthorized modification'

Related Topics:benchmarks Grok lie xAI

Everything new in Marvel Rivals Season 1.5 update

5 movies leaving Netflix in February 2025 you have to watch now

Continue Reading

Croatia to reintroduce compulsory military draft as regional tensions soar

Croatia to reintroduce compulsory military draft as regional tensions soar

Breaking News2 years ago

Croatia to reintroduce compulsory military draft as regional tensions soar

Singapore Airlines CEO set to join board of Air India, ET TravelWorld News, ET TravelWorld

Singapore Airlines CEO set to join board of Air India, ET TravelWorld News, ET TravelWorld

Destination1 year ago

Singapore Airlines CEO set to join board of Air India, BA News, BA

Supernatural Sam and Dean Winchester

Supernatural Sam and Dean Winchester

Gadgets1 year ago

Supernatural Season 16 Revival News, Cast, Plot and Release Date

Productivity2 years ago

How Your Contact Center Can Become A Customer Engagement Center

Tech News2 years ago

Bangladeshi police agents accused of selling citizens’ personal information on Telegram

Google Pixel 9 Pro vs Samsung Galaxy S25 Ultra in Mous cases

Google Pixel 9 Pro vs Samsung Galaxy S25 Ultra in Mous cases

Gadgets10 months ago

Google Pixel 9 Pro vs Samsung Galaxy S25 Ultra: Camera Comparison Review

The Criterion Collection

The Criterion Collection

Gaming2 years ago

The Criterion Collection announces November 2024 releases, Seven Samurai 4K and more

Fallout Season 2 Potential Release Date, Cast, Plot and News

Fallout Season 2 Potential Release Date, Cast, Plot and News

Gadgets10 months ago

Fallout Season 2 Potential Release Date, Cast, Plot and News