Tech News

One of Google’s recent Gemini AI models scores worse on safety

Published

1 month ago

May 2, 2025

Google’s latest AI model, Gemini 2.5 Flash, has been found to perform worse on certain safety tests compared to its predecessor, Gemini 2.0 Flash, based on internal benchmarking by the company.

In a recent technical report, Google disclosed that Gemini 2.5 Flash is more likely to generate text that violates safety guidelines, with regressions of 4.1% in “text-to-text safety” and 9.6% in “image-to-text safety” metrics. These tests assess how well the model adheres to safety guidelines when generating text from a prompt or an image, respectively.

The company confirmed that Gemini 2.5 Flash shows lower performance in text-to-text and image-to-text safety. This comes at a time when AI companies are striving to make their models more permissive, allowing them to respond to controversial topics without bias.

While Google claims that Gemini 2.5 Flash follows instructions more accurately than its predecessor, it also acknowledges instances where the model generates content that violates guidelines. The company attributes some of these regressions to false positives but admits to occasional generation of “violative content.”

Techcrunch event

Berkeley, CA
|
June 5

BOOK NOW

Scores from SpeechMap, a benchmark testing sensitivity to controversial prompts, reveal that Gemini 2.5 Flash is less likely to refuse to answer contentious questions compared to Gemini 2.0 Flash. Testing also showed the model’s willingness to generate content supporting controversial topics like AI replacing human judges and warrantless government surveillance.

Thomas Woodside from the Secure AI Project emphasized the need for more transparency in model testing, citing the trade-off between following instructions and adhering to safety policies.

Google has faced criticism for its model safety reporting practices in the past, particularly delays in publishing detailed technical reports and safety testing information. The company has since released a more comprehensive report addressing these concerns.