Tech News
Meta exec denies the company artificially boosted Llama 4’s benchmark scores

On Monday, a Meta executive denied rumors that the company had trained its new AI models to excel on specific benchmarks while hiding their weaknesses.
Ahmad Al-Dahle, VP of generative AI at Meta, stated that the claims were false regarding Meta training its Llama 4 Maverick and Llama 4 Scout models on test sets. Test sets are used to assess a model’s performance after training and training on them could artificially inflate benchmark scores, giving a false impression of the model’s capabilities.
An unverified rumor surfaced over the weekend suggesting that Meta had manipulated the benchmark results of its new models. This rumor seemed to stem from a post on a Chinese social media platform by a former Meta employee who objected to the company’s benchmarking methods.
Concerns about the performance of Maverick and Scout on certain tasks contributed to the rumor, especially after Meta used an unreleased version of Maverick to achieve higher scores on the LM Arena benchmark. Researchers noticed significant differences in behavior between the publicly available Maverick model and the one used on LM Arena.
Al-Dahle admitted that users have reported varying quality from Maverick and Scout on different cloud platforms hosting the models.
He explained, “As we released the models as soon as they were ready, it may take some time for all public implementations to be optimized. We are actively addressing any issues and collaborating with our partners.”
-
Destination6 months ago
Singapore Airlines CEO set to join board of Air India, BA News, BA
-
Breaking News8 months ago
Croatia to reintroduce compulsory military draft as regional tensions soar
-
Tech News10 months ago
Bangladeshi police agents accused of selling citizens’ personal information on Telegram
-
Breaking News8 months ago
Bangladesh crisis: Refaat Ahmed sworn in as Bangladesh’s new chief justice
-
Gaming8 months ago
The Criterion Collection announces November 2024 releases, Seven Samurai 4K and more
-
Guides & Tips8 months ago
Have Unlimited Korean Food at MANY Unlimited Topokki!
-
Toys10 months ago
15 of the Best Trike & Tricycles Mums Recommend
-
Tech News9 months ago
Soccer team’s drone at center of Paris Olympics spying scandal