Connect with us

Tech News

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

Published

on

distorted meta logo and other brands including facebook, instagram, whatsapp, oculus, and messenger

On Monday, a Meta executive denied rumors that the company had trained its new AI models to excel on specific benchmarks while hiding their weaknesses.

Ahmad Al-Dahle, VP of generative AI at Meta, stated that the claims were false regarding Meta training its Llama 4 Maverick and Llama 4 Scout models on test sets. Test sets are used to assess a model’s performance after training and training on them could artificially inflate benchmark scores, giving a false impression of the model’s capabilities.

An unverified rumor surfaced over the weekend suggesting that Meta had manipulated the benchmark results of its new models. This rumor seemed to stem from a post on a Chinese social media platform by a former Meta employee who objected to the company’s benchmarking methods.

Concerns about the performance of Maverick and Scout on certain tasks contributed to the rumor, especially after Meta used an unreleased version of Maverick to achieve higher scores on the LM Arena benchmark. Researchers noticed significant differences in behavior between the publicly available Maverick model and the one used on LM Arena.

Al-Dahle admitted that users have reported varying quality from Maverick and Scout on different cloud platforms hosting the models.

He explained, “As we released the models as soon as they were ready, it may take some time for all public implementations to be optimized. We are actively addressing any issues and collaborating with our partners.”

See also  Paradox CEO admits company made "wrong calls in several projects" in wake of Life By You's cancellation

Trending