Gemini 3 Flash Makes Its Grand Debut: Blazing Speed and Intelligence Surpassing Pro, Ushering in a New Chapter for AI

artifact1mos agoupdate AiFun
269 0

Just now, Google released its latest model. Gemini 3 Flash, as introduced, boasts cutting-edge intelligence and is built for speed, empowering everyone to learn, build, and plan anything faster.

Long before the model's release, Logan Kilpatrick, Product Lead for Google AI Studio and Gemini API, posted a tweet on X containing only three lightning symbols. At the time, many netizens speculated that this signaled Google's imminent launch of a Flash version model prioritizing speed above all else.

Sure enough, Google's Gemini 3 Flash model arrived as promised tonight.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

Google Unveils Geimin Flash 3.0

Over the past year, from Gemini 1.5 to 3.0, Google has continuously deepened its technological capabilities in multimodal processing, long-context handling, and reasoning, while simultaneously reducing model invocation costs. This strategy aims to establish a more cost-effective competitive advantage within enterprise applications and the developer ecosystem. Against this backdrop, the Flash series—prioritizing high performance and low latency—is regarded as the product line within the Gemini framework that most closely aligns with real-world business scenarios.

As external calls for “faster, cheaper, and easier-to-deploy” models continue to grow louder, Google's Gemini Flash 3, unveiled tonight, is widely seen as a pivotal move in advancing both inference efficiency and large-scale deployment capabilities.

Google announced that starting today, Gemini 3 Flash will be rolled out to millions of users worldwide:

  • Developers using the Gemini API in Google AI Studio, Gemini CLI, and Google's new agent development platform, Google Antigravity

  • All users can access it through the Gemini app and AI mode during searches.

  • Enterprise solutions for Vertex AI and Gemini Enterprise

So, how does this model actually perform?

Google states on its official website that Gemini 3 Flash achieves speed and scale without sacrificing intelligence.

It excels in doctoral-level reasoning and knowledge benchmark tests (such as GPQA Diamond 90.4%) and Humanity’s Last Exam (33.71 TP4T, no tools used) demonstrate cutting-edge performance comparable to larger state-of-the-art models, and significantly outperform the current best model, Gemini 2.5 Pro, across multiple benchmarks.

Specifically, Gemini 3 Pro achieved a score of 33.71 TP4T without any tools, while Gemini 3 Pro scored 37.51 TP4T, Gemini 2.5 Flash scored 111 TP4T, and the newly released GPT-5.2 scored 34.51 TP4T.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

Humanity's Last Exam Model Score Rankings

Additionally, it achieved a remarkable score of 81.21 TP4T in the MMMU Pro test, matching the performance of the Gemini 3 Pro.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

Beyond cutting-edge reasoning capabilities and multimodal processing, Gemini 3 Flash is engineered for extreme efficiency, pushing beyond the Pareto frontier of quality, cost, and speed. When operating at peak cognitive levels, Gemini 3 Flash dynamically adjusts its processing time.

Performance surpasses Gemini Pro 2.5, yet priced significantly lower.

For more complex application scenarios, it may require longer processing time. However, based on typical traffic test results, it uses an average of 301 fewer tokens than the 2.5 Pro, enabling it to complete daily tasks with higher performance and greater accuracy.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

Gemini 3 Flash breaks through the Pareto frontier in performance, cost, and speed.

The Gemini 3 Flash excels with its blazing speed, built upon the Flash series architecture. Its performance surpasses the 2.5 Pro, delivering three times the speed (based on Artificial Analysis benchmarks), yet at a significantly lower price point.

In terms of pricing, Gemini 3 Flash offers better value compared to previous generations. Gemini 3 Flash is priced at $0.50 per million input tokens and $3 per million output tokens (audio input pricing remains at $1 per million input tokens).

This is slightly more expensive than Gemini Flash 2.5's $0.30 per million input tokens and $2.50 per million output tokens. However, Google claims the new model outperforms Gemini 2.5 Pro and operates at three times the speed. Moreover, when handling reasoning tasks, it uses an average of 301 fewer tokens than 2.5 Pro. This means users may save on token counts overall for certain tasks.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

In terms of programming performance, Gemini 3 Flash delivers the professional-grade encoding capabilities of Gemini 3 while maintaining ultra-low latency—enabling rapid inference and task resolution in high-frequency workflows.

In the SWE-bench Verified benchmark used to evaluate proxy encoding capabilities, Gemini 3 Flash achieved a score of 781 TP4T, surpassing not only the 2.5 series but even the Gemini 3 Pro. It strikes an ideal balance between proxy encoding, production-ready systems, and responsive interactive applications.

Gemini 3 Flash重磅登场:速度超快智力反超Pro,开启AI新篇章

Gemini 3 Flash's powerful capabilities in reasoning, tool usage, and multimodal functions make it ideal for developers seeking to perform more complex video analysis, data extraction, and visual question answering. This enables smarter applications—such as game assistants or A/B testing experiments—that require both rapid responses and deep reasoning.

Additionally, it's worth noting that Gemini 3 Flash is now being rolled out as the default model for AI mode in search, available to users worldwide.

Leveraging the reasoning capabilities of Gemini 3 Pro, Gemini 3 Flash's AI mode can more effectively parse the nuances of user queries. It considers every aspect of the user's question, delivering comprehensive and easy-to-understand answers—extracting real-time local information and useful links from across the web. Ultimately, it seamlessly combines research with immediate action: users receive a well-organized, structured analysis report along with concrete recommendations—all at search-like speed.

Google stated that it positions Gemini Flash more as a “mainstream model” rather than a high-end showcase device.

Tulsee Doshi, Senior Director and Product Lead at Gemini Models, noted in a TechCrunch briefing that comparing input and output pricing in the price list clearly shows Flash is significantly more cost-effective. This makes it better suited for handling large-scale, batch processing tasks, effectively lowering the barrier to entry and overall costs for businesses.

Since the release of Gemini 3, Google has rapidly scaled its API processing capacity, now handling over 1 trillion tokens daily.

Meanwhile, Google is also engaged in a head-to-head competition with OpenAI over the pace of new product releases and model performance.

Reports indicate that earlier this month, as Google's market share in the consumer sector increased, ChatGPT experienced a decline in overall traffic. In response, OpenAI CEO Sam Altman issued an internal memo dubbed a “red alert” to his team. Subsequently, OpenAI released both GPT-5.2 and a new image generation model, emphasizing sustained growth in enterprise-level application demand. The company also disclosed that ChatGPT's message volume has increased approximately eightfold since November 2024.

Although Google has not directly addressed its competitive relationship with OpenAI, it believes that the rapid release of new models is accelerating progress across the entire industry.

“The current state of the entire industry is that various models are rapidly evolving, competing with each other and constantly pushing the boundaries of performance,” Doshi stated. “Equally impressive is how actively companies are launching new models.”

She also mentioned that Google is continuously introducing new benchmarking systems and model evaluation methods, a trend that in itself has energized the team about the industry's development.

© Copyright notes

Related posts

No comments

none
No comments...