AI Mode at Scale: 1 Billion Users and 3.2 Quadrillion Tokens
AI Mode is Google's conversational search layer — launched in May 2025 and built on Gemini models — that generates synthesized, multi-step answers to natural-language queries rather than returning a ranked list of document links. On May 20, 2026, exactly one year after launch, Google announced at I/O 2026 that AI Mode surpassed 1 billion monthly active users . AI Overviews — the older, broader feature present across standard Search — now reaches 2.5 billion monthly users separately . Token volume across Gemini-powered Search surfaces hit 3.2 quadrillion per month as of I/O 2026, up from 480 trillion the prior year — a 6.7× increase in twelve months . AI Mode query volume has more than doubled every quarter since launch. Gemini 3.5 Flash is now the global default model powering AI Mode as of May 20, 2026 .
The token volume figure is the clearest signal of structural change. A 6.7× increase in tokens processed does not come from user-base growth alone; it reflects a fundamental shift in query length and complexity. When users compose multi-sentence, context-rich queries instead of two-word keyword fragments, each request consumes significantly more tokens at inference time. The compounding effect of a larger user base each submitting longer queries explains the nonlinear growth from 480 trillion to 3.2 quadrillion in one year.
Query volume doubling each quarter — four consecutive quarters — has direct infrastructure implications. That growth rate, combined with Gemini 3.5 Flash's throughput characteristics (discussed in section four), indicates that Google is actively compressing cost-per-query to sustain the economics at this scale. Flash-class models at high throughput is the formula for serving a billion users conversationally without the inference bill becoming prohibitive.
| Metric | Value | Period / Note | Change |
|---|---|---|---|
| AI Mode monthly active users | 1 billion | May 20, 2026 (one-year milestone) | — |
| AI Overviews monthly active users | 2.5 billion | May 2026 | — |
| Monthly token volume (all Gemini Search surfaces) | 3.2 quadrillion | Q2 2026 | +6.7× YoY |
| Prior-year token volume | 480 trillion | Q2 2025 | Baseline |
| AI Mode query volume growth rate | >2× per quarter | Since launch (May 2025) | Sustained 4 consecutive quarters |
| Default AI Mode model (global) | Gemini 3.5 Flash | From May 20, 2026 | — |
For developers, the scale milestone carries a practical implication: as AI Mode scales, the economic rationale for building complementary tooling via Google's APIs becomes more favorable while the economics of purely SEO-driven content become structurally less reliable. The addressable API-consuming user base has expanded; the search-referral traffic base for informational content has contracted. The rest of this analysis details why, and what the builder implications are across seven dimensions.
Four Structural Shifts in U.S. Query Behavior
Google's internal data, shared at I/O 2026, describes four measurable shifts in how U.S. users interact with Search under AI Mode. Taken together, they indicate that users are no longer treating Search as a lookup tool for factual retrieval — they are using it as a reasoning interface capable of holding state across multiple turns and handling complex, multi-constraint questions. The redesigned Search box — which accepts text, images, files, video, and active Chrome tabs as input, dynamically expands as users type, and surfaces AI-generated query-angle suggestions beyond simple autocomplete — both reflects and reinforces this behavioral shift .
"Google Search isn't just search with AI features. It is AI search, through and through." — Elizabeth Reid, Head of Search, Google (source: Google Blog, I/O 2026)
| Behavioral Shift | Measured Value | Context |
|---|---|---|
| Average AI Mode query length | 3× longer than traditional keyword searches | Users compose contextual questions with constraints, not keyword fragments |
| Follow-up query growth (U.S.) | +40% month-over-month | Multi-turn dialogue becoming the norm in U.S. sessions |
| Multimodal search share | 16% of all U.S. searches | Voice, image, video, or Chrome tab combined with text input |
| Planning-oriented query growth | 80% of overall AI Mode growth rate | Travel, home renovation, financial decisions; multi-step, comparison-heavy |
The query length increase is the most structurally significant of the four shifts. Traditional SEO keyword research is predicated on short, high-volume head terms and medium-tail variants. AI Mode changes the input surface itself: the redesigned interface suggests contextual angles that prompt users to add constraints, preferences, and goals. A search that previously read "best noise canceling headphones" now arrives as "noise canceling headphones that work well on long-haul flights wearing glasses, under $300, not over-ear." The AI Mode index is built to handle those queries; conventional keyword-matching infrastructure is not .
The 40% month-over-month growth in follow-up queries in the U.S. is the marker of multi-turn dialogue adoption . Once users learn that AI Mode holds conversational context — that they can say "what about the third option in a smaller size" without re-stating the original query — they stop treating each query as an isolated lookup. This has a direct consequence for analytics: a multi-turn session that resolves entirely within Search generates no pageview, no session, and no attribution in GA4 or Search Console. The engagement happened; it is simply invisible to site-side measurement.
The 16% multimodal share means roughly one in six U.S. search sessions now includes a non-text input modality . For developers building search-adjacent tools, this is a signal about interface expectations that extends beyond Google itself: users are acclimating to input modalities beyond the text field. Products that are text-only risk feeling limited relative to a search bar that accepts a photograph of a broken appliance and returns repair instructions.
Planning-oriented queries growing at 80% of the overall AI Mode growth rate reflects a qualitative shift in use case, not just volume . Planning queries are typically multi-constraint, comparison-heavy, and time-distributed — exactly what conversational AI handles better than a list of links. Travel itinerary planning, contractor sourcing for home renovation projects, and financial product comparison are all categories where users previously assembled information across multiple tab sessions. AI Mode compresses that into a single conversation, and as a result those sessions no longer land on the sites those tabs would have visited.
Information Agents: Standing Conditions and 24/7 Background Monitoring
Information Agents are autonomous background processes introduced at Google I/O 2026 as the next architectural step beyond AI Overviews and conversational AI Mode. The distinction is precise: AI Overviews are reactive — they generate a response only when a user submits a query. Information Agents are proactive — they run continuously without any user-submitted query triggering them, monitoring user-defined standing conditions and pushing synthesized updates when those conditions are met . A user defines a condition — "alert me when a two-bedroom apartment in the Mission District drops below $3,200/month" or "notify me when the Air Jordan 4 Retro restocks in size 11" — and the agent monitors the relevant data sources continuously.
The launch scope is constrained: Information Agents will be available first to Google AI Pro and Google AI Ultra subscribers in the U.S. in summer 2026 , with a broader U.S. rollout to follow. No international launch timeline has been specified as of I/O 2026. The subscriber-first rollout follows Google's pattern for compute-intensive features: contain costs during initial deployment, iterate on alert quality, then expand to the full user base.
"2026 is the diffusion year for AI search. By 2027, the agentic shift will happen pretty profoundly." — Sundar Pichai, CEO, Google (source: PPC Land)
For developers, the key architecture detail is what is not available: no public API surface was announced at I/O 2026 for registering, querying, or subscribing to Information Agent standing conditions . Agents operate entirely inside Google's knowledge graph and proprietary data pipelines. There is no documented webhook, no push endpoint, and no programmatic way for third-party developers to list their data feeds as agent-accessible sources. This contrasts with the more open, pluggable agent ecosystem model some developers anticipated based on Google's earlier MCP partnership statements.
The competitive implication for SaaS builders in real estate, e-commerce, financial data, and similar categories is concrete. If a user configures an Information Agent to track apartment prices, that agent pulls from Google's own housing data index — not from Zillow's or Redfin's API. The standing condition is defined, monitored, and resolved entirely within Google's surface. The third-party property is bypassed at intent capture, which is typically where high-intent user acquisition occurs.
The monitoring use cases demonstrated at I/O 2026 included travel price tracking, product restock alerts, earnings call summaries, and geographic competitive pricing . Each maps to an existing category of specialized tools — fare alert apps, stock notification services, financial data terminals, price-intelligence SaaS. The core value proposition of those tools is "we monitor this so you don't have to." When that proposition is offered natively inside a surface with a billion monthly active users, the competitive calculus is fundamentally different from a startup with a $9/month plan .
What remains unknown as of late May 2026: how Information Agents interact with structured data published on third-party websites, whether schema markup affects an agent's ability to index real-time inventory or pricing signals, and what the latency characteristics are for standing-condition evaluation. For developers maintaining structured data feeds or real-time pricing APIs, whether that data is picked up by an agent operating inside Google's graph is currently unanswerable from public documentation.
Model Infrastructure: Why Gemini 3.5 Flash Is the Runtime Default

Gemini 3.5 Flash became the global default model for AI Mode on May 20, 2026 . Google describes it as 4× faster than comparable frontier models on output tokens per second — a specification that explains the architectural choice when serving AI Mode at a billion-user scale with sub-second perceived latency . For most production AI applications, latency × cost × tool-call overhead is a more useful optimization target than benchmark accuracy scores. Flash-class models are the correct default when the use case is high-throughput, real-time user interaction where acceptable quality at the margins trades off against infrastructure cost.
The latency profile of Gemini 3.5 Flash enables two capabilities that would otherwise require visible loading states or degraded UX: real-time conversational output (where the model completes tokens fast enough that the user does not wait for a block of text to materialize) and generative layout rendering (where Gemini generates the HTML for a custom in-SERP UI component without a perceptible delay between query submission and interactive result). Both are central to the Generative UI layer described in the next section.
For developers calling Gemini via API, the switch to Gemini 3.5 Flash as the Search runtime default has an indirect but relevant implication: the model you should prototype against for Search-adjacent use cases is Flash, not 1.5 Pro. Flash pricing and rate limits differ materially from 1.5 Pro endpoints . If you are building a latency-sensitive product that wraps Gemini or integrates with Search APIs, confirm the specific endpoint, pricing tier, and quota limits before committing to an architecture. Rate limits on Flash endpoints at high throughput are a constraining factor in production and are not interchangeable with 1.5 Pro quotas.
The broader model infrastructure signal from I/O 2026 is that Google is standardizing Flash-class as the foundation for consumer-facing AI surfaces, with the larger frontier models reserved for deep-reasoning tasks and Pro/Ultra subscriber features. For agentic loop design, this matters quantitatively: if your application makes sequential Gemini API calls for multi-step reasoning, Flash's per-token cost and latency advantage compounds across the call chain. A five-step agentic workflow at Flash latency completes in a user-perceptible timeframe; the same workflow at frontier-model latency likely requires a background job and an async notification pattern instead — a substantially different product architecture decision.
Generative UI: Antigravity Platform and In-SERP Mini Apps
Google's Antigravity platform is the internal name for the Generative UI layer in Search — a system that uses Gemini 3.5 Flash's code-generation capabilities to produce query-specific interactive layouts on demand, rendered inline inside Search results without requiring navigation to any third-party site . Rather than returning links to a mortgage calculator or a product comparison dashboard, Antigravity generates a functional version of that tool — pre-populated with the constraints the user specified in their query. The global rollout began the week of May 19, 2026, at no cost to users .
The output types Antigravity can generate include custom calculators (mortgage, calorie, currency conversion, loan amortization), simulation interfaces (investment growth projections, trip budget breakdowns, nutritional comparisons), and comparison dashboards (side-by-side product or service matrices). From the user's perspective these are "mini apps" appearing inside the Search page; from a developer's perspective they are dynamically generated UI components backed by model inference, not pre-built embeds from a third-party CDN.
The scope of competitive impact is defined by one characteristic: query-initiated, lightweight utility. Any tool that a user currently finds by typing "[thing] calculator" or "[category] comparison" into Google and clicking through to a third-party site is within scope. That describes a specific and large category of web properties. SaaS tools in B2C utility — unit converters, financial calculators, mortgage estimators, calorie counters, travel cost calculators, product comparison widgets — that rely on organic search as their primary acquisition channel are directly exposed. Antigravity eliminates the click before it occurs .
There is no indication from I/O 2026 that Antigravity will be available as a developer API surface. The feature operates as a first-party Google product with no documented integration path for external developers. If you are building a tool in this category, the strategic conclusion is that organic search is no longer a defensible moat for lightweight utilities. Differentiation needs to come from workflow integration (writing results to a spreadsheet or connected account), personalization built on user-specific history not accessible to a one-shot SERP interaction, or domain-specific complexity — regulatory nuance, localized data, multi-party coordination — that general-purpose code-gen cannot replicate inline.
Zero-Click at 60%: Traffic Economics for Web Properties
Approximately 60% of all Google queries now resolve inside Google without generating a click to any third-party site — a figure that spans AI Mode results, AI Overviews, featured snippets, Knowledge Graph cards, and other in-SERP answer formats . When an AI Overview is present in a result set, that rate climbs to 80–83%. The implication is that Search has functionally shifted from a traffic-distribution network to an answer-delivery system — one that uses the open web as training and citation material without reliably routing users to its sources.
The publisher-level data from the past twelve months quantifies the impact. Referral traffic from search fell 33% globally year-over-year through November 2025 . Ahrefs measured a 58% click-through rate reduction for top-ranking pages on keywords where an AI Overview is present, in data through February 2026 . At the company level: HubSpot reported losing 70–80% of organic traffic; Chegg disclosed a 49% decline; DMG Media reported up to 89% drops on specific query categories .
NPR characterized the situation as an "extinction-level event" for online news — a framing that reflects the structural dependency many ad-supported publications built on search-referral as their primary acquisition channel. (source: Press Gazette)
Citation dynamics inside AI Overviews are also shifting in ways that matter for anyone optimizing for AI-driven visibility. Only 17–54% of AI Overview citations now come from top-10 organic results, down from 76% in mid-2025 . Citation churn is high: 70% of pages cited in AI Overviews lose that citation within 2–3 months . Optimizing for AI Overview citation placement is therefore a less stable investment than ranking for traditional organic positions — the citation surface is both smaller as a share of top-organic results and volatile on a two-to-three-month cycle.
One narrow positive signal: branded query click-through rates are up 18% under AI Overviews . When a user already knows a brand and searches for it specifically, AI Mode reinforces rather than replaces the navigation intent. Brand-building activity retains direct measurable value in the AI Mode era even as informational content loses its traffic-generation function.
Search Console attribution is an unresolved technical problem. Google has not yet published a clear model for how conversational queries — which may embed personal context like a user's location, income range, or dietary restrictions — will surface in Search Console query data. Impression reporting almost certainly undercounts actual AI Mode query coverage. Developers using Search Console data to make content investment decisions should apply a significant uncertainty discount to impression counts for AI Mode-eligible content categories.
The structural diagnosis: organic referral traffic from informational queries is not in a cyclical dip recoverable by a content refresh or a technical SEO audit. It is experiencing a permanent structural compression driven by an in-SERP answer layer scaling faster than the underlying web traffic it draws from. Traffic from branded queries and high-intent transactional queries retains more value; content designed to rank for pure-informational keywords is progressively losing its distribution function regardless of ranking position .
Agentic Booking and Task Execution: Where Google Is Displacing Apps

Google's agentic task execution capabilities extended significantly in 2026, moving beyond the flight and hotel booking integrations of prior years into local dining, entertainment, and home services . For certain U.S. categories, Google's agents can call businesses directly on the user's behalf — completing reservation or service requests via phone, email, or API — rather than surfacing a link for the user to navigate and complete manually. This moves Google from information retrieval into task execution: a qualitatively different competitive position relative to specialized booking apps and service marketplaces.
The categories with documented 2026 expansion include private dining reservations, local entertainment ticketing, home repair and maintenance services (plumbing, electrical, HVAC), and personal service bookings. The surface overlap with existing platforms is direct: OpenTable and Resy for dining, Thumbtack and Angi for home services. In categories where Google's agentic layer can complete the booking end-to-end, the specialized app's value is reduced to receiving the confirmation on a session that began and was resolved elsewhere.
The session attribution problem compounds the competitive exposure. When a Google agent calls a restaurant or a contractor on a user's behalf, the booking may complete without the third-party platform's interface being involved at all. That session does not appear in OpenTable's or Thumbtack's analytics. The lead was captured and converted at Google's layer. The specialized platform may see the booking appear in its backend if Google is using that platform's API as a fulfillment channel, but it does not receive attribution for the acquisition — the highest-value point in the customer lifecycle.
Sundar Pichai's characterization of 2027 as the year agentic shifts will happen "pretty profoundly" is worth taking at face value as a planning horizon . If 2026 is the diffusion year and 2027 is the year of profound agentic adoption, the window for strategic differentiation is the next 12–18 months. The defensible positions for third-party apps in agent-adjacent categories are:
- Proprietary data depth: inventory, reviews, or real-time availability that Google's graph does not index at the same fidelity or freshness
- Workflow integration: connecting a booking to a user's calendar, expense management system, or project tool in ways that require account-level context Google does not hold
- Complex multi-party coordination: bookings involving multiple participants, contracts, deposit handling, or negotiation steps that exceed a single API call
- Post-booking relationship: the follow-up, loyalty, and repeat-engagement surface that Google's one-shot task execution does not build
Google's agentic booking expansion does not uniformly eliminate third-party apps; it eliminates the search-to-app-install funnel as an acquisition path for the specific moment of first booking intent. Apps that retain users through a differentiated ongoing experience — rather than depending on search intent at the moment of initial task completion — are structurally more resilient to this shift . The parallel to what happened to vertical search in the 2010s (Google absorbing price comparison, flight search, hotel search into native SERP features) is instructive: the apps that survived did so by deepening the post-discovery workflow, not by competing on the discovery layer itself.
Frequently Asked Questions
What are Google Information Agents and how do they differ from AI Overviews?
Google Information Agents are autonomous background processes that run continuously without requiring a user-submitted query. Users define standing conditions — a price range for a rental listing, a product restock alert, an earnings call notification — and the agent monitors relevant data sources around the clock, then pushes synthesized updates when those conditions are met. AI Overviews, by contrast, are reactive: they generate a synthesized answer only when a user actively submits a query. Information Agents were announced at Google I/O 2026 and are scheduled to launch first for Google AI Pro and AI Ultra subscribers in the U.S. in summer 2026, with a broader U.S. rollout to follow. There is no international launch timeline confirmed as of May 2026.
Can developers access Google Information Agents via API?
No public API surface was announced at Google I/O 2026. As of May 2026, Information Agents are a first-party subscriber feature accessible only through Google's own interface for AI Pro and AI Ultra users. There are no published webhooks, registration endpoints, or programmatic interfaces that allow third-party developers to create standing conditions, subscribe to agent outputs, or register external data feeds as agent-accessible sources. Google has not indicated any timeline for a developer API for Information Agents. This means that third-party monitoring and alert tools cannot currently participate in or extend the standing-condition system.
Why has the average Google AI Mode query length tripled?
Google's redesigned Search box — updated at I/O 2026 — accepts multi-modal input including text, images, files, video, and active Chrome tabs, and dynamically expands as users type. Beyond expanded input capacity, the interface proactively suggests contextual query angles that prompt users to add constraints, preferences, and goals. Instead of submitting "cheap flights to Tokyo," users are guided toward "flexible dates in September, budget under $900 roundtrip from SFO, prefer non-stop, open to nearby airports." The interface design itself nudges richer input, and the AI Mode system handles multi-constraint queries more effectively than conventional keyword-matching does — reinforcing the behavioral shift. Google disclosed at I/O 2026 that AI Mode queries are now three times longer on average than traditional keyword searches, with follow-up queries in the U.S. growing 40% month-over-month as multi-turn dialogue becomes the norm.
How does 60% zero-click search affect developers who rely on SEO-driven traffic?
With approximately 60% of Google queries resolving inside Google without a third-party click — rising to 80–83% when an AI Overview is present — informational content that previously served as a reliable top-of-funnel acquisition channel is now primarily consumed in-SERP. For developer tools and SaaS companies with content marketing programs, the measurement model needs to shift: treat Search presence for informational queries as a brand-visibility signal rather than a direct traffic driver. Optimize for branded search (which shows an 18% click-through rate uplift under AI Overviews) and move acquisition budget toward channels where the click-to-site conversion is not intermediated by an AI answer layer. Search Console impression data currently underreports actual AI Mode query coverage, making attribution analysis unreliable for informational content categories. Treat the data with a significant uncertainty margin when making content investment decisions.
What is the Antigravity platform and which developer use cases does it threaten?
Antigravity is Google's internal codename for its Generative UI layer in Search. It uses Gemini 3.5 Flash's code-generation capabilities to produce interactive, query-specific tools — calculators, comparison dashboards, simulation interfaces — directly inside Search results, without requiring navigation to any external site. The platform began rolling out globally the week of May 19, 2026, at no cost to users. Developer tools and SaaS products directly in scope include any lightweight utility that users currently find by typing "[topic] calculator" or "[category] comparison" into Google and clicking through: unit converters, mortgage and loan calculators, financial estimators, travel cost tools, calorie counters, and product comparison widgets. If the user's need can be satisfied by a generated interactive component inline in a SERP, the third-party site loses the session before it begins. Tools with deeper workflow integration (writing to connected accounts), account-level personalization, or domain complexity that resists general-purpose code-gen are less immediately exposed.
What Builders Should Take Away from I/O 2026
The signal from Google I/O 2026 is clear across all seven dimensions examined in this analysis: Google is systematically internalizing the use cases that have historically generated traffic to third-party properties. Information retrieval is now an in-SERP function. Lightweight utility tools are generated on demand by Antigravity. Booking and task execution are moving into Google's agentic layer. The open web remains the raw material for AI answers, but its role as a traffic destination for informational queries is in structural contraction — not a cyclical dip.
For developers building on or adjacent to the Google ecosystem, three adjustments are worth making now. First, if your product's primary user acquisition path runs through informational organic search, model out what a sustained 30–60% reduction in that channel looks like — the data from late 2025 through early 2026 already shows that range in the most affected categories. Second, if you are building with Gemini APIs, align your prototype environment with Gemini 3.5 Flash endpoints; this is the model class powering Google's consumer surfaces and the one whose latency and cost characteristics best represent production AI Mode behavior. Confirm Flash-specific pricing and quota limits before committing to a production architecture. Third, if your product competes with Google's agentic capabilities in booking, task completion, or standing-condition monitoring, the defensible position is not in the initial intent-capture layer — it is in the ongoing relationship, workflow depth, and proprietary data that a one-shot SERP interaction cannot replicate.
Google's search market share declined from 92.9% in 2023 to 89.6% by mid-2025 — the steepest drop in the company's history — and I/O 2026 represents Google's direct response to that pressure: an AI Mode at 1 billion users, processing 3.2 quadrillion tokens per month , generating its own tools via Antigravity, and executing tasks autonomously via Information Agents. Whether this configuration reverses the market share trend or accelerates fragmentation toward ChatGPT, Perplexity, Kagi, and Brave Search is the open question for the remainder of 2026.
Last updated: 2026-05-28. Based on Google I/O 2026 announcements (May 19–20, 2026), official Google blog disclosures, third-party publisher traffic analyses, and search industry reporting current through May 2026.

