Gemini Flash Review: Google’s New Default Falsh 3.5 Beats Last Year’s Pro (But Costs 3x More)

🆕 Major Update (May 23, 2026) — Google I/O 2026: Google launched Gemini 3.5 Flash on May 19, 2026 and made it the new default model across the Gemini app, Search AI Mode, and the API. The headline: a Flash-badged model that beats last year’s Gemini 3.1 Pro on 11 of 15 published benchmarks (76.2% on Terminal-Bench 2.1 vs 70.3%). The catch: at $1.50/$9.00 per million tokens it costs 3x the old Gemini 3 Flash and is now more expensive than both Claude Haiku 4.5 and GPT-5.4 Mini. Google also cut Google AI Ultra from $249.99 to a new $99.99 entry tier, shipped Gemini Omni (video) and the Gemini Spark agent, and Antigravity 2.0 now runs on Gemini 3.5 Flash. We’ve rewritten this Gemini 3.5 Flash review around the new lineup.

For most of 2026, “the cheap tier” of large language models was an easy call: Gemini Flash undercut everyone on price and you moved on. Google just blew that framing up. This Gemini Flash review covers what actually changed at Google I/O 2026 (May 19) and why the whole “Flash equals cheapest” story no longer holds. The new Gemini 3.5 Flash is the default model Google now serves to most users, it’s genuinely excellent at coding and agentic work, and it ships at a price that is structurally higher than every credible cheap-tier competitor. Google moved Flash upmarket. The cheap-tier crown slid down to Flash-Lite. We’ll walk through the whole Gemini Flash family as it exists in late May 2026: 3.5 Flash (the new default), 3.1 Flash-Lite (the actual cheap tier now), 3.1 Flash TTS (voice), Gemini Omni Flash (video), and the still-supported 3 Flash, 2.5 Flash, and 2.5 Flash-Lite. What each one does, what they cost, where they win, and where the new pricing changes the math.

⚡ TL;DR – The Bottom Line

What It Is: Google’s family of fast, multimodal LLMs. The new flagship is Gemini 3.5 Flash (launched May 19, 2026), alongside 3.1 Flash-Lite, 3.1 Flash TTS, Gemini Omni Flash, and the older 3 Flash, 2.5 Flash, and 2.5 Flash-Lite.

Best For: Coding agents, multi-step tool use, RAG over large documents, and anyone using the free Gemini app (which now defaults to 3.5 Flash). High-volume classification belongs on Flash-Lite, not the flagship.

Price: Gemini 3.5 Flash $1.50/$9.00 per 1M tokens ($0.15 cached input). Flash-Lite $0.25/$1.50. The old Gemini 3 Flash is still $0.50/$3.00. Free in the consumer Gemini app.

Our Take: The best agentic and coding model the Flash series has ever shipped, beating last year’s Pro on real-work benchmarks. But it is no longer the cheap-tier price leader, and it’s notably verbose, which inflates output cost.

⚠️ The Catch: At $1.50/$9.00, Gemini 3.5 Flash costs 3x the model it replaced and more than Claude Haiku 4.5 and GPT-5.4 Mini. If you only want cheap, the old 3 Flash, Flash-Lite, or a competitor now wins on raw price.

$1.50/M
3.5 Flash Input
1M
Context Window
76.2%
Terminal-Bench 2.1
Free
Consumer Tier

The Bottom Line: Should You Use Gemini 3.5 Flash?

It depends on which door you walk through. If you use AI through the free consumer Gemini app, the answer is an easy yes: as of May 19, 2026 the app defaults to Gemini 3.5 Flash, and there is no better free chatbot tier on the market right now. You get a near-Pro model for nothing. If you’re a developer building on the API, the answer got more interesting. Gemini 3.5 Flash at $1.50/$9.00 per million tokens (input/output) is the best agentic and coding model the Flash line has ever shipped, beating last year’s Gemini 3.1 Pro on most “real work” benchmarks. But it is no longer the cheap option. It costs 3x the model it replaced and more than both Claude Haiku 4.5 ($1.00/$5.00) and GPT-5.4 Mini ($0.75/$4.50).

So the decision splits cleanly. Choose Gemini 3.5 Flash when your workload is coding agents, multi-step tool use, or anything where the 1M-token context and the strong benchmarks earn back the higher token price. Choose Gemini 3.1 Flash-Lite ($0.25/$1.50) or the older Gemini 3 Flash ($0.50/$3.00) when you just need cheap, high-volume inference. And reach for Gemini 3.1 Pro or a competitor when you need either the deepest reasoning on hard problems or the lowest possible bill. The “cheap-tier leader” title moved within the family. It didn’t disappear.

What the Flash Family Actually Is

“Gemini Flash” isn’t one model. It’s a brand name Google applies to the smaller, faster variants of each Gemini generation. The old pattern was simple: ship a flagship Pro, then a few weeks later ship a cheaper, distilled Flash version of it. I/O 2026 broke that pattern in two ways. First, Google shipped the Flash version of the 3.5 generation before the Pro version (Gemini 3.5 Pro is still in testing). Second, it pushed Flash’s price up toward Pro, betting that “frontier intelligence at Flash speed” is worth paying for. The result is a lineup where the word “Flash” no longer reliably means “cheap.”

As of late May 2026 the Flash lineup actively in production is:

  • Gemini 3.5 Flash — the new default, released May 19, 2026 at Google I/O. $1.50 per million input tokens, $9.00 per million output, $0.15 cached input. 1M-token context (1,048,576), 64K max output. Knowledge cutoff January 2025. Built for coding and agentic work. Multimodal across text, image, audio, and video.
  • Gemini 3 Flash — the former default, now effectively legacy. $0.50/$3.00 per million tokens. 1M context. Still the cheaper choice for general-purpose work that doesn’t need the 3.5 upgrade.
  • Gemini 3.1 Flash-Lite — $0.25/$1.50 per million tokens, 1M context. The actual cheap tier now. Designed for very high-throughput, latency-sensitive jobs: classification, tagging, moderation, ranking.
  • Gemini 3.1 Flash TTS — text-to-speech variant focused on expressive, controllable AI voice in 30+ languages. Built for voice apps, IVR, and audio content.
  • Gemini Omni Flash — new at I/O 2026. Takes text, image, audio, or video as input and generates editable video grounded in real-world knowledge. Available in the Gemini app and Google Flow.
  • Gemini 2.5 Flash and 2.5 Flash-Lite — still supported. 2.5 Flash is $0.30/$2.50; 2.5 Flash-Lite is $0.10/$0.40, the cheapest Gemini text model period. Both 1M context. The models most existing production apps were built on.

Three tiers now map onto three jobs. The flagship Flash (3.5) is the coding-and-agent specialist. The general workhorses (3 Flash, 2.5 Flash) handle chat, content, and summarization at a lower price. The Lite variants (3.1, 2.5) exist for jobs you run thousands or millions of times where cost-per-call beats everything else. Picking the wrong tier is now an expensive mistake in both directions: running classification on 3.5 Flash wastes money, and running a coding agent on Flash-Lite wastes capability.

A visual hierarchy diagram showing the Gemini Flash family pyramid: Gemini 3.5 Flash at the top as the agentic and coding flagship, Gemini 3 Flash and 2.5 Flash as the general workhorses in the middle, and Flash-Lite at the base for highest-volume, lowest-cost work

The Five-Minute Test (What Gemini 3.5 Flash Actually Feels Like)

The fastest way to feel the new Flash is to open the Gemini app at gemini.google.com and ask it three things you’d normally ask Claude or ChatGPT. Pick one open-ended writing task, one summarization task on a long document, and one coding question. The free tier now runs Gemini 3.5 Flash by default, wrapped in a redesigned interface Google calls “Neural Expressive” (soft animated colors while it thinks, responses laid out with highlights and timelines instead of a wall of text). The app crossed 900 million monthly active users this spring, so this is the model a lot of people are now using without knowing its name.

Three things you’ll notice in those five minutes. First, the speed. Google clocks 3.5 Flash at roughly 4x the output tokens per second of competing frontier models, and it feels like it: responses start almost instantly and short answers finish in a second or two. Second, the coding and tool-use competence. Ask it to debug a script or walk through a multi-step task and it holds the thread in a way the old Flash didn’t. Third, the multimodal handling. Drop in a PDF, a screenshot, an audio file, or a video and it processes all of them natively. Video input remains a Gemini-family advantage that Claude and ChatGPT still trail on.

Where it visibly struggles in five minutes: deep, knowledge-heavy reasoning and finding a specific detail buried deep in a very long document. On both, Gemini 3.1 Pro is still the stronger model. Flash 3.5 is tuned for “get the task done,” not “answer the hardest question in the world.” For the deepest reasoning the app offers Deep Think with a daily allowance, or you upgrade.

A composite illustration of Gemini 3.5 Flash multimodal capability — a single chat window simultaneously processing a PDF document, an audio waveform, and a video frame, with the response streaming in under two seconds

Getting Started: Free Tier vs API Access

There are three paths into the Gemini Flash family, and the I/O 2026 changes reshuffled all three.

Path 1: The Gemini Consumer App (Free)

Sign in at gemini.google.com with any Google account. The default model is now Gemini 3.5 Flash, rolled out at I/O. You get effectively unlimited everyday chat, 100 free image generations per day via Nano Banana, Canvas (the document workspace), Gems (custom personas), and Gemini Live voice mode. There’s a small daily allowance of deeper-reasoning Pro access for hard tasks. The free and paid tiers serve the same model quality. The only difference is rate limits.

Path 2: Google AI Plus / Pro / Ultra (Consumer Subscription)

Google restructured these plans at I/O 2026. Google AI Plus is $7.99/month (rolled out worldwide in January 2026). Google AI Pro at $19.99/month includes Gemini 3.1 Pro with a 1M-token context window inside Gmail, Docs, Sheets, and Slides, plus cloud storage. Google AI Ultra got a major price cut: a new entry tier at $99.99/month (5x the Gemini usage limits of Pro, 20TB storage, a YouTube Premium plan, priority access to Antigravity, and the new Gemini Spark agent), while the former $249.99 top tier dropped to $199.99/month (20x limits, Project Genie). All paid tiers also unlock Gemini Omni and Gemini 3.5 Flash. None of these plans meter Flash usage heavily; the upgrades are mostly about Pro access, agents, and integrations.

Path 3: The Gemini Developer API (Pay Per Token)

For production apps, get a key at ai.google.dev. The Developer API has a free tier (rate-limited, useful for prototyping but not production traffic) and per-token paid pricing above the quotas. Gemini 3.5 Flash is $1.50/$9.00 with $0.15 cached input (non-global regions bill slightly higher at $1.65/$9.90); Gemini 3 Flash is $0.50/$3.00; Gemini 3.1 Flash-Lite is $0.25/$1.50; Gemini 2.5 Flash is $0.30/$2.50; Gemini 2.5 Flash-Lite is $0.10/$0.40. All Flash models support the full 1M-token context. The model ID for the new flagship is gemini-3.5-flash, and Google also shipped a new Interactions API (beta) for server-side conversation history, similar to OpenAI’s Responses pattern. The same models are available with enterprise SLAs through Vertex AI on Google Cloud.

Features That Actually Matter (my take after this Gemini Flash review)

Agentic and Coding Performance (The New Headline)

This is where Gemini 3.5 Flash earns its existence. On Google’s published benchmarks it beats last year’s Gemini 3.1 Pro on 11 of 15 tests, and the wins cluster exactly where developers feel them: Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas tool use (83.6% vs 78.2%), Finance Agent v2 (57.9% vs 43.0%, a 14.9-point jump), and GDPval-AA (1,656 Elo vs 1,314). On SWE-Bench Pro it’s effectively tied with 3.1 Pro (55.1% vs 54.2%). In plain terms: a model wearing the Flash badge now does multi-step coding and tool-orchestration work that needed the Pro tier six months ago. If you’re building coding agents in tools like Antigravity 2.0 or the new Antigravity CLI, the upgrade is close to a no-brainer.

1M-Token Context Window (With One New Caveat)

Every Gemini Flash variant supports a 1,000,000-token context window. Claude Haiku 4.5 caps at 200K and GPT-5.4 Mini at 400K, so for sheer capacity Flash still wins by a wide margin: an entire codebase, a 1,500-page document, or hours of transcripts in one prompt. The new caveat: Gemini 3.5 Flash actually regressed slightly on long-context retrieval versus 3.1 Pro (it scores lower on the 128K slice of the MRCR needle-in-haystack test). It can accept a million tokens; it’s just a little less reliable at pulling one specific fact out of the deep middle of them. For large-codebase refactors that depend on cross-file reasoning, 3.1 Pro is the safer pick until 3.5 Pro ships.

Native Multimodality, Plus Gemini Omni for Video

Gemini 3.5 Flash processes images, audio, and video natively in the same prompt. Drop a 10-minute MP4 in and ask “what happens at minute 4?” with no preprocessing; feed it a podcast and ask for chapter timestamps. For generating media, Google now splits the work across siblings: Nano Banana for images, Gemini 3.1 Flash TTS for voice, and the new Gemini Omni Flash for video creation and editing. One notable absence worth flagging: unlike the rest of the Gemini 3.x series, 3.5 Flash does not support computer use, so for browser-automation agents you’ll want a different model.

Thinking Levels and Production Tooling

Gemini 3.5 Flash exposes explicit thinking levels (minimal, low, medium, high) and defaults to medium, letting you trade quality against cost and latency per request. For developers building agents it supports the standard production toolkit: function calling, tool-choice control, JSON-mode structured outputs validated against your schema, and the usual sampling controls. Higher thinking levels measurably improve hard-task results but burn more output tokens, which matters more than usual here for a reason in the next callout.

Batch API (50% Off) and Context Caching (90% Off)

The same production discounts apply. The Batch API processes non-urgent requests asynchronously at half rate, dropping Gemini 3.5 Flash from $1.50/$9.00 to $0.75/$4.50. Context caching bills repeated content (long instructions, reference documents) at roughly 10% of fresh input after the first call, and 3.5 Flash’s cached input is just $0.15 per million. For RAG or document-Q&A workloads where the same large context is referenced across many queries, the combined effect is a large cost reduction over naive per-call pricing, and it’s the main lever for making the flagship’s higher token price palatable.

💡 Key Takeaway: Gemini 3.5 Flash is verbose. Independent testing found it generates roughly twice the output tokens of comparable models to finish the same benchmark suite. Since output is billed at $9.00 per million, that verbosity is a real cost multiplier. Use structured-output schemas, the lower thinking levels where you can, and context caching to keep the bill in check.

Pricing: The Whole Flash Family Side By Side

Model Input ($/1M) Output ($/1M) Context Released Best For
Gemini 3.5 FlashNEW DEFAULT $1.50 $9.00 1M May 19, 2026 Coding agents, multi-step tool use
Gemini 3 FlashFORMER DEFAULT $0.50 $3.00 1M Dec 17, 2025 General-purpose work on a budget
Gemini 3.1 Flash-LiteCHEAP TIER $0.25 $1.50 1M Mar 3, 2026 High-throughput classification, tagging
Gemini 3.1 Flash TTSVOICE n/a per audio sec n/a Apr 15, 2026 Voice apps, IVR, audio content
Gemini 2.5 FlashLEGACY $0.30 $2.50 1M 2025 Existing production apps
Gemini 2.5 Flash-LiteCHEAPEST $0.10 $0.40 1M Jul 22, 2025 Cheapest text model in the lineup

A few practical observations. The jump from Gemini 3 Flash to 3.5 Flash is steep: input triples ($0.50 to $1.50) and output triples ($3.00 to $9.00). What you buy is the agentic and coding leap, plus the speed. If your workload doesn’t use that leap, staying on 3 Flash literally cuts your token bill by two-thirds. At the bottom, Flash-Lite is in a different universe: 3.1 Flash-Lite at $0.25/$1.50 and 2.5 Flash-Lite at $0.10/$0.40 are where high-volume jobs belong. Running a classifier a million times a day on 3.5 Flash instead of Flash-Lite is the kind of mistake that turns a $300 invoice into a $9,000 one.

🔍 REALITY CHECK

Marketing Claims: “Frontier intelligence at Flash speed and Flash-class cost” (Google’s positioning for Gemini 3.5 Flash at I/O 2026).

Actual Experience: Half true. The speed is real (about 4x competitors on output tokens per second) and the intelligence is real (it beats last year’s Pro on coding and agents). But “Flash-class cost” is doing a lot of work. At $1.50/$9.00 this is 3x the price of the Gemini 3 Flash it replaced and roughly 6x the price of Flash-Lite. As one widely-shared developer note put it, all three major labs now appear to be probing how much their API customers will tolerate. The free consumer app is the genuinely cheap part of this story; on the API, “Flash” no longer means “cheapest.”

Verdict: Pay for 3.5 Flash when you need its agentic and coding edge. For pure cost, the older 3 Flash, Flash-Lite, or a competitor is now the better buy.

📊 Cheap-Tier LLM Pricing — May 2026

Per 1 million tokens (USD). Lower is cheaper. Note how the new Gemini 3.5 Flash moved from the cheapest end of this chart to the most expensive.

📬 Enjoying this review?

Get honest LLM and AI tool reviews delivered every Thursday. No hype, no spam.

Subscribe Free →

Gemini 3.5 Flash vs Claude Haiku 4.5 vs GPT-5.4 Mini

This is the table that flipped. In our April review, Gemini Flash won every price row. With the 3.5 upgrade, the headline Flash is now the most expensive of the three on raw tokens, so the comparison is no longer “Gemini wins on price, full stop.” Here’s the honest head-to-head.

Specification Gemini 3.5 Flash Claude Haiku 4.5 GPT-5.4 Mini
Input price ($/1M) $1.50 $1.00 $0.75
Output price ($/1M) $9.00 $5.00 $4.50
Context window 1,000,000 200,000 400,000
Multimodal inputs Text, image, audio, video Text, image Text, image, audio
Coding (Terminal-Bench 2.1) 76.2% ~70% ~71%
Output speed ~280 tok/s Fast Fast
Verbosity (cost risk) High Lower Moderate
Free consumer tier Effectively unlimited Daily message cap With ads (since Feb 2026)

📐 Cheap-Tier LLM Capability Profile

Subjective 0-10 scoring across the dimensions that matter for the cheap-tier buyer. Higher is better. Note Gemini 3.5 Flash now trades raw price for capability.

Read it this way. Gemini 3.5 Flash wins on context window (5x Haiku, 2.5x GPT Mini), multimodal coverage (the only one with native video), and coding-and-agent benchmarks. GPT-5.4 Mini wins on raw price (cheapest input and output now) and on the OpenAI ecosystem (Codex, Agent Mode, custom GPTs, the deepest third-party tool market). Claude Haiku 4.5 sits in the middle on price, wins on writing voice and lower verbosity, and carries Anthropic’s stronger enterprise-compliance posture. The right pick genuinely depends on whether you’re optimizing for capability per token (Flash), lowest bill (GPT Mini), or prose quality and predictability (Haiku).

💡 Key Takeaway: For most teams the cheap-tier choice in mid-2026 is a three-way split, and it changed since April. Use Gemini 3.5 Flash for coding and agent workloads where its benchmark lead pays for the higher token price; reach for the older Gemini 3 Flash or Flash-Lite when you just need cheap inference; and pick GPT-5.4 Mini when you want the lowest bill or the OpenAI ecosystem, or Claude Haiku when prose voice and predictable cost matter most.

Who Should Use Each Flash Variant

  • Use Gemini 3.5 Flash if: you’re building coding agents, multi-step tool-use workflows, or document-Q&A over large context, and the agentic benchmark lead justifies the higher token price. This is the right default for new agentic work in mid-2026, especially paired with Antigravity 2.0 or the Antigravity CLI.
  • Use Gemini 3 Flash if: you want general-purpose chat, content, or summarization at a third of 3.5 Flash’s price, or you have an existing app that doesn’t benefit from the 3.5 upgrade. It’s now the value workhorse of the family.
  • Use Gemini 3.1 Flash-Lite (or 2.5 Flash-Lite) if: you’re running classification, tagging, moderation, or ranking at high volume where cost-per-call beats reasoning depth. At $0.25/$1.50 (or $0.10/$0.40) this is where the real cheap-tier savings live now.
  • Use Gemini 3.1 Flash TTS or Gemini Omni Flash if: you’re building voice apps and IVR (TTS, 30+ languages) or generating and editing video from text, image, audio, or video inputs (Omni). These are the media-generation siblings of the text Flash models.
  • Don’t use Gemini 3.5 Flash if: you only care about the lowest bill (use Flash-Lite, 3 Flash, or GPT-5.4 Mini), you need the deepest reasoning or reliable long-context retrieval (use Gemini 3.1 Pro or Claude Opus 4.7), or you need computer-use browser automation (3.5 Flash doesn’t support it).

🔍 REALITY CHECK

Marketing Claims: “Gemini 3.5 Flash beats Gemini 3.1 Pro” (Google’s I/O 2026 framing, repeated across most launch coverage).

Actual Experience: True, and genuinely impressive, but selectively true. Flash wins 11 of 15 published benchmarks against 3.1 Pro, and the wins are all in the “real work” bucket: coding, tool use, agents, structured documents. On the other 4, Pro is still clearly better, and they’re the ones that matter for hard thinking: Humanity’s Last Exam (44.4% vs 40.2%), ARC-AGI-2 abstract reasoning (77.1% vs 72.1%), and long-context retrieval (the 128K MRCR slice, where Pro leads by several points). So a model wearing the Flash badge beating last year’s Pro is real, but it’s a specialist beating a generalist on the specialist’s home turf, not Flash becoming the better model overall.

Verdict: Use 3.5 Flash for agentic and coding work where it genuinely leads. Don’t read “beats Pro” as “Flash is now the smartest model” — for deep reasoning and long-document retrieval, 3.1 Pro still wins until 3.5 Pro arrives.

Community Reception & Real-World Use

Launch-week reception of Gemini 3.5 Flash has been a study in mixed signals. The capability story landed well: developers were genuinely surprised that a Flash model beat the previous Pro on coding and agent benchmarks, and the speed (around 280 output tokens per second) drew real praise for interactive coding tools. Pichai’s on-stage pitch leaned hard on economics, claiming a company burning a trillion tokens a day could save over a billion dollars annually by shifting most of its workload to 3.5 Flash. The free-app default to 3.5 Flash, plus the surprise price cut on Google AI Ultra (from $249.99 to a $99.99 entry tier), read to most consumers as a clear win.

The pushback clustered around three things. The API price jump was the loudest: tripling Flash’s per-token cost while branding it “Flash-class cost” struck many developers as the whole industry testing how much customers will pay, and the older 3 Flash and Flash-Lite suddenly look like the value plays. Verbosity drew specific complaints, since a chatty model billed at $9 per million output tokens can quietly inflate bills on generation-heavy tasks. And the missing computer-use capability caught teams building browser agents off guard, since the rest of the Gemini 3.x line supports it. None of these are dealbreakers; they’re the predictable friction of a major-version launch that repositioned a whole product tier overnight.

The Road Ahead: What’s Coming for Flash

The big near-term item is already named: Gemini 3.5 Pro is in testing and Google says it ships next month (June 2026). That matters for Flash because the two benchmarks where 3.5 Flash trails today, deep reasoning and long-context retrieval, are exactly where Pro is expected to shine, so reasoning-critical pipelines that can wait a few weeks may want to hold. Beyond that, Google is clearly investing in the split it telegraphed at I/O: Flash as the agentic-coding tier, Flash-Lite as the industrial-volume tier, and the new Gemini Omni line as the media-generation tier. Expect Flash-Lite to get its own benchmark category and the Omni models to expand their input and output coverage through the rest of 2026.

The competitive context tightened too. OpenAI shipped its GPT-5.5 flagship on April 23, 2026 with a steep price jump, and the matching GPT-5.5 Mini is widely expected between late June and mid-August. When that lands, the cheap-tier price war reopens, and the interesting question is whether OpenAI follows Google upmarket or undercuts it. Anthropic has been quieter on the Haiku roadmap; Haiku 4.5 has held the floor since late 2025. The realistic 12-month read is that all three labs are now probing price tolerance at the small-model tier at once, and “cheapest” is going to keep changing hands. The honest planning assumption: pick the model that fits the workload today, keep your code provider-portable, and re-check pricing every quarter, because it moves.

FAQs

Is Gemini Flash free to use?

Yes, through two paths. The Gemini consumer app at gemini.google.com now defaults to Gemini 3.5 Flash on the free tier with effectively unlimited everyday chat use. The Gemini Developer API has a free tier with rate limits useful for prototyping but not production traffic. So you can use the newest Flash for free as a chatbot; you only pay once you’re building on the API above the free quotas.

What’s the difference between Gemini 3.5 Flash and Gemini 3 Flash?

Gemini 3.5 Flash (released May 19, 2026) is the new default and a big step up on coding and agentic work, beating last year’s Gemini 3.1 Pro on benchmarks like Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%). The catch is price: it’s $1.50/$9.00 per million tokens versus $0.50/$3.00 for the older Gemini 3 Flash, which is 3x more. Gemini 3 Flash is still supported and is now the value choice for general-purpose work that doesn’t need the 3.5 capability jump.

Is Gemini 3.5 Flash cheaper than Claude Haiku or GPT-5 Mini?

No longer, on raw token price. After the May 2026 upgrade, Gemini 3.5 Flash at $1.50/$9.00 is more expensive than both Claude Haiku 4.5 ($1.00/$5.00) and GPT-5.4 Mini ($0.75/$4.50). Where Flash still wins is context window (1M vs 200K and 400K), native video input, and coding/agent benchmarks. If you want the cheapest Gemini option, that’s now Flash-Lite or the older 3 Flash, not the flagship.

What is the context window of Gemini Flash?

Every Flash model in the lineup, including 3.5 Flash, supports a 1,000,000-token context window (the API reports 1,048,576 input tokens, with up to 64K output). That’s the largest in any cheap-tier LLM in 2026. Practically, 1M tokens fits roughly a 1,500-page document or 100,000+ lines of code in one prompt. One caveat: 3.5 Flash is slightly weaker than Gemini 3.1 Pro at retrieving a specific fact buried deep in very long context, so for needle-in-haystack work over huge documents, Pro is still more reliable.

Does Gemini 3.5 Flash work with images, audio, and video?

Yes, all three as inputs, natively. Drop a PDF, screenshot, audio file, or video into the chat or an API call and Flash processes them alongside text in the same prompt. For generating media you use the siblings: Nano Banana for images, Gemini 3.1 Flash TTS for voice, and the new Gemini Omni Flash for video. One gap to note: unlike the rest of the Gemini 3.x series, 3.5 Flash does not support computer-use browser automation.

How much does Gemini 3.5 Flash cost for a typical app?

For a chatbot serving roughly 10,000 conversations per month at 2,000 input and 500 output tokens each, Gemini 3.5 Flash costs about $75/month at standard pricing, or about $37.50/month with the Batch API discount. The same workload runs about $45 on Claude Haiku 4.5, $37.50 on GPT-5.4 Mini, and just $25 on the older Gemini 3 Flash. For pure cost, 3.5 Flash is now the priciest of that group; its case is capability per token, not the lowest bill. Context caching (cached input at $0.15/M) narrows the gap on RAG workloads that reuse a large context.

Should I switch from Gemini 3 Flash to Gemini 3.5 Flash?

Only if your workload uses what 3.5 Flash adds. If you’re running coding agents or multi-step tool use, the benchmark and speed gains are worth the higher price, so switch. If you’re doing general chat, content, or high-volume inference where 3 Flash already works, staying put cuts your token bill by about two-thirds. As always, run the new model through your eval harness before flipping the default, especially given 3.5 Flash’s verbosity and the slight long-context retrieval regression.

✅ What We Liked

  • ✓ Best agentic and coding model the Flash line has shipped
  • ✓ Beats last year’s Gemini 3.1 Pro on 11 of 15 benchmarks
  • ✓ 1M-token context and the only native video input in its class
  • ✓ ~4x faster output than competing frontier models
  • ✓ Free in the Gemini app, plus Google AI Ultra cut to $99.99

❌ What Fell Short

  • ✗ 3x the price of old 3 Flash; no longer the cheap tier
  • ✗ Verbose output inflates cost at $9/M output
  • ✗ No computer-use support, unlike the rest of Gemini 3.x
  • ✗ Slight long-context retrieval regression vs 3.1 Pro
  • ✗ Trails 3.1 Pro on deep reasoning benchmarks
★★★★½
4.5/5
Editor’s Rating

The best agentic and coding model the Flash series has shipped, and the free-app default just got far more capable. Half a star off because the 3x price jump means “Flash” no longer means “cheap,” and the verbosity plus missing computer use are real friction.

The Final Verdict

The takeaway from this Gemini 3.5 Flash review is that Google changed what “Flash” means. The April version of this story was simple: Flash is the cheapest credible LLM, use it by default. The May version is sharper and more interesting. Gemini 3.5 Flash is now a near-Pro agentic and coding specialist that beats last year’s flagship on real work, runs about four times faster, and costs roughly 25% less than Gemini 3.1 Pro, while costing 3x what the old Flash did and more than its cheap-tier rivals. For the free consumer app, this is a straight upgrade with no downside. For developers, it’s a deliberate trade: pay more per token, get genuinely Pro-class coding and agent performance.

So the recommendation splits. If you’re building coding agents or tool-use pipelines, Gemini 3.5 Flash is one of the best values in AI right now and worth defaulting to. If you’re running high-volume inference or just want the smallest bill, the older Gemini 3 Flash, Flash-Lite, or a competitor like GPT-5.4 Mini is the smarter call. And if your team is already invested in Claude or GPT for prose quality, ecosystem, or compliance reasons, the 3.5 Flash upgrade isn’t enough on its own to switch. The Gemini Flash family is still the most flexible cheap-to-mid tier in the market. You just have to pick the right rung of the ladder now, because they no longer all cost the same.

A clean editorial verdict graphic for the Gemini 3.5 Flash review — a Gemini 3.5 Flash branded mark with a 4.5 out of 5 star rating, the price tag $1.50 per million input tokens, and the tagline 'Frontier coding at Flash speed, but no longer the cheap tier'
T
Reviewed by Tanveer Ahmad

Founder of AI Tool Analysis. Tests every tool personally so you don’t have to. Covering AI tools for 10,000+ professionals since 2025. See how we test →

Stay Updated on LLM Releases

The cheap-tier LLM market shifts every quarter, and Gemini 3.5 Flash just proved it. Subscribe for honest reviews of new model releases, pricing changes, and benchmark updates, delivered every Thursday at 9 AM EST.

  • Honest Reviews: We actually test these models, not rewrite press releases
  • Price Tracking: Know when models drop prices or change tiers
  • Feature Launches: Major model updates covered within days
  • Comparison Updates: Side-by-side benchmarks as the market shifts
  • No Hype: Just the AI news that actually matters for your work

Free, unsubscribe anytime. 10,000+ professionals trust us.

Want AI insights? Sign up for the AI Tool Analysis weekly briefing.

Newsletter

Signup for AI Weekly Newsletter

Last Updated: May 23, 2026

Models Tested: Gemini 3.5 Flash, Gemini 3 Flash, Gemini 3.1 Flash-Lite, Gemini 3.1 Flash TTS, Gemini Omni Flash, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite

Next Review Update: June 2026 (when Gemini 3.5 Pro ships) or sooner if pricing changes

Have a tool you want us to review? Suggest it here | Questions? Contact us