Google launches Gemini 2.5 Flash AI model: superior performance, lower cost
googleIn an April 17 announcement, the company announced that in Google AI Studio and Vertex AIin order to Gemini The Gemini 2.5 Flash Preview previews AI models by way of an API.
Users can also optimize documentation and code editing in conjunction with Canvas tools by using the model selector directly from within the Gemini app.
Gemini 2.5 Flash is a hybrid inference model with "dynamic and controlled" computational capabilities, allowing developers to flexibly adjust processing times based on the complexity of the query request.The model innovatively introduces an adjustable "think budget" feature that significantly reduces the cost of ownership while maintaining high performance.

Google notes that Gemini 2.5 Flash is ideally suited for "high-volume" and "real-time" application scenarios, such as customer service and document parsing. Optimized for low latency and reduced costs, this working model is the ideal engine for responsive virtual assistants and real-time summarization tools," Google said in a blog post.

In closed-source thinking mode, its cost is only $0.6/million tokens, which is a significant 600% reduction compared to the full-featured thinking mode ($3.5/million tokens).Notably, even running in base mode, it still outperforms its predecessor, Gemini 2.0 Flash.
performance performance.The Gemini 2.5 Flash is the second highest in the large model rankings with a 1392 ELO score, behind the GPT-4.5-preview and on par with the Grok-3 performance.
In task-specific tests, the model shows significant advantages: in the GPQA knowledge quiz, a 24K thinking budget leads to a 6% performance improvement; in the LiveCodeBench code benchmark test, the best performance is achieved at a 16K thinking budget.
Comparison tests show that Gemini 2.5 Flash significantly outperforms Claude 3.7 Sonnet in multimodal reasoning and mathematical tasks, and its overall performance is on par with OpenAI's latest o4-mini model. In the "Last Human Exam" benchmark test, which simulates the comprehensive ability of human beings, the model ranked second with a high score of 12.1%, once again confirming its strong strength.
The release of Gemini 2.5 Flash provides a breakthrough balance between performance and cost through the innovative "Think Budget" mechanism, providing a more flexible and cost-effective option for AI application development.
As the first fully hybrid inference model, developers can switch inference functions on and off as needed, flexibly adjusting response quality, cost, and latency, making Gemini 2.5 Flash a lower-cost but high-performing model compared to the cutting-edge models from Anthropic and Grok.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...