Grok Build CLI: Plan Mode, Skills, Connectors, and Pricing

xAI's Grok Build ships with Arena Mode, reusable Skills, and CLAUDE.md compat. Here's what developers need to know.

Grok Build CLI: Plan Mode, Skills, Connectors, and Pricing

What Grok Build Does: The Plan → Search → Build Loop

Grok Build is xAI's terminal-native coding agent — a CLI tool installed inside a project directory and directed via plain-language instructions. Unlike GUI-based code generation tools, it operates entirely in the shell, inspecting your codebase, proposing changes through a structured plan, and writing files only after explicit developer approval. The core workflow follows three phases: Plan, Search, and Build, with every proposed file change surfaced as a reviewable diff before execution. According to Codersera's technical overview, this plan-first model is the defining architectural decision separating Grok Build from tools that generate code immediately .

Quick Answer: Grok Build is xAI's CLI coding agent that installs with a single curl command and follows a Plan → Search → Build loop. Every proposed file change surfaces as a diff before any file is written. It launched May 14, 2026, initially exclusive to SuperGrok Heavy ($299/mo) subscribers, and supports up to 8 parallel sub-agents in Arena Mode.

Installation is a single command: curl -fsSL https://x.ai/cli/install.sh | bash . Run that inside your project root, and Grok Build indexes the codebase. From there, you issue instructions in natural language; the agent produces a numbered Plan before touching any file. The xAI developer release notes document the install flow and flag the project directory requirement — the CLI expects to find your code already in scope when it starts.

Plan Mode is where Grok Build differs most clearly from agents that write code on demand. After parsing your instruction, the agent generates a step-by-step breakdown — numbered, specific, tied to actual files in the repository. No file is written until you approve the plan . If a step looks wrong, you can annotate or rewrite it directly before Build begins. This annotation layer lets you correct the agent's reasoning mid-plan rather than reverting after the fact — a meaningful difference when dealing with multi-file changes.

The diff-before-write guarantee complements Plan Mode: once the plan is approved and the Build phase runs, every proposed file modification appears as a standard unified diff. You see exactly which lines are being added or removed before anything is committed to disk. For teams already practicing code review, this integrates naturally — Grok Build's output is reviewable in the same way a pull request is.

xAI launched Grok Build on May 14, 2026 , entering a market already occupied by Anthropic's Claude Code and OpenAI's Codex CLI. Powered by the grok-build-0.1 API model, it is xAI's most substantial developer-facing product beyond the conversational Grok chatbot. The Plan → Search → Build loop, diff guarantees, and annotation layer collectively position Grok Build as an interruption-safe agent: it is designed to keep the developer in control at every decision point rather than front-loading trust.

Arena Mode: How Parallel Sub-Agents Produce Ranked Candidates

Use Grok in Kilo Code
Source: x.ai

Arena Mode is Grok Build's architecturally distinctive feature: instead of a single agent working sequentially, Arena Mode launches up to 8 sub-agents concurrently — 4 running Grok Code 1 Fast and 4 running Grok 4 Fast — each independently exploring a solution branch . When the parallel runs complete, results are scored and presented as a ranked list for human selection. No code is committed without the developer explicitly picking a candidate from that list.

The scoring system evaluates three dimensions: test pass rate, diff size (smaller and more targeted ranks higher), and plan adherence — how closely the output matches the approved plan. This surfaces trade-offs rather than forcing a single winner: a compact solution passing fewer tests sits alongside a larger patch with full coverage, each scored transparently. Arena Mode presents the ranked list rather than merging outputs, keeping the developer in control of the final selection decision.

"Arena Mode is designed for situations where multiple valid solution paths exist — the system explores each path in parallel and scores the results, so developers can pick a winner from evidence rather than guessing which approach is best upfront." — xAI technical documentation, as summarized by Codersera

xAI demonstrated Arena Mode publicly with 20 parallel sub-agents investigating a production performance issue — simultaneously probing slow database queries, cache hit rates, and endpoint latency across the same codebase . The demonstration was a deliberate showcase of the parallelism argument: ambiguous production bugs with multiple potential root causes benefit from concurrent hypothesis exploration rather than sequential debugging where each dead end blocks the next attempt.

When to use Arena Mode vs. standard mode: Arena Mode is most useful for ambiguous bugs and architectural problems where several valid approaches exist. For routine feature additions or well-scoped tasks with a clear implementation path, running 8 sub-agents adds token cost and latency with little practical benefit. Standard single-agent mode handles those cases faster.

Arena Mode Property Details
Max concurrent sub-agents 8 (4 × Grok Code 1 Fast + 4 × Grok 4 Fast)
Scoring criteria Test pass rate, diff size, plan adherence
Output format Ranked candidate list — no automatic merge
Human selection required Yes — no code committed without explicit developer pick
Demonstrated scale (xAI demo) 20 sub-agents on a single production performance investigation
Best use case Ambiguous bugs, multi-path design problems, refactor trade-off analysis
Worst use case Routine feature adds, well-scoped tasks with one obvious implementation

Grok Skills: Reusable Instruction Bundles Invoked via Slash Commands

Grok Skills are named, versioned instruction bundles that encapsulate a complete workflow and are invoked via slash commands inside any active Grok Build session . Each Skill has four components: a name (which becomes the slash command trigger), a description (shown in autocomplete), an instructions block (the full behavioral spec), and optional reference_files (specific repository files always included in context when the Skill runs). Skills are version-controlled alongside project code — they travel with the repository through pull requests and code reviews.

xAI shipped an initial set of Skills on May 18, 2026, covering document and data workflows: Word document generation, PowerPoint creation, Excel spreadsheets with formulas, PDF operations, and workflow automation . These are invoked with slash commands directly in the CLI — no context switch to a separate interface, no copy-pasting prompts from a saved notes document.

The distinction between Skills and saved prompts is the organizational model. A saved prompt is local and informal; a Skill is scoped, versioned, and shareable. A Skill can be scoped to a single project or shared across a team organization, turning a repeatable instruction pattern into a callable library entry. If your team has a standard process for generating deployment checklists, producing spec documents from code, or running style-guide reviews, that workflow becomes a named Skill rather than tribal knowledge.

The versioning model matters for teams that want consistency across agents and developers. A Skill update goes through the normal pull request review process — if a change degrades output quality, it is visible in a diff and reversible. This treats prompt engineering as a first-class software artifact rather than ephemeral configuration, which is a meaningful shift for teams maintaining long-running codebases with multiple contributors.

Connectors: GitHub, Vercel, and the BYO-MCP Bridge

Source: x.ai

Grok Build's connector system links the CLI to external services, making their data available as live context during a build session without requiring manual import. Connectors authenticate against third-party APIs — reading GitHub issues, Notion pages, Linear tickets, or Vercel deployment logs directly into the agent's context. The rollout proceeded in two waves, with Bring-Your-Own-MCP (BYO-MCP) support in Wave 1 enabling teams to bridge any existing Model Context Protocol server into Grok Build from day one .

Wave 1 shipped on May 6, 2026 with GitHub, Notion, Linear, Google Workspace, Microsoft 365, Salesforce, and BYO-MCP . Wave 2 followed on May 22, 2026 with Vercel, Canva, Gamma, and S&P Global . The sixteen-day gap between waves suggests active cadence rather than a single batch release.

BYO-MCP compatibility has one hard constraint: the target MCP server must be publicly internet-accessible . Teams running internal-only MCP servers on private networks will need to expose those servers externally or route through a proxy before connecting to Grok Build. That said, MCP servers already configured for Claude Code or Cursor connect without protocol-level reconfiguration beyond that network requirement — the compatibility is genuine. According to Basenor's technical breakdown, the public accessibility constraint is the primary friction point for enterprise teams with air-gapped or VPN-only development environments.

For CI/CD integration, Grok Build provides two mechanisms: the -p flag for headless non-interactive mode (suppresses all approval prompts, runs fully automated) and the Agent Client Protocol (ACP) for custom orchestration within larger automated pipelines . Teams already using Claude Code's -p flag in shell scripts can use the same flag position with Grok Build's headless mode.

Wave Release Date Connectors Included
Wave 1 May 6, 2026 GitHub, Notion, Linear, Google Workspace, Microsoft 365, Salesforce, BYO-MCP
Wave 2 May 22, 2026 Vercel, Canva, Gamma, S&P Global
BYO-MCP requirement Both waves Target MCP server must be publicly internet-accessible

CLAUDE.md Compatibility: Migrating Existing Projects from Claude Code

Grok Build natively reads AGENTS.md, CLAUDE.md, CLAUDE.local.md, and .claude/rules/ with zero configuration change . Project context files that exist in a Claude Code setup transfer directly on first run — months of accumulated project rules, coding standards, and agent instructions are immediately available to Grok Build without a migration step.

"Grok Build is fully compatible with Claude Code with zero configuration needed — CLAUDE.md, CLAUDE.local.md, and .claude/rules/ are all recognized natively." — xAI, as reported by DevOps.com

The compatibility is context-import only. Three categories of Claude Code configuration do not transfer, and teams should audit these before deploying Grok Build in production pipelines:

  • Claude Code tool definitions: Custom tool configurations registered with Claude Code's tool system are not portable — Grok Build has its own tool registration layer and does not read Claude Code's tool manifests.
  • Local MCP hooks: MCP server configurations scoped to a local Claude Code session (typically in .claude/settings.json or project-local config files) do not carry over. You will need to re-establish those MCP connections via Grok Build's connector system.
  • Claude-specific settings files: Any configuration outside the shared project files stays in the Claude Code ecosystem. Only the files explicitly listed — CLAUDE.md, CLAUDE.local.md, AGENTS.md, .claude/rules/ — are consumed by Grok Build.

For teams running Claude Code's -p flag in CI/CD shell scripts, Grok Build headless mode accepts the flag in the same position. A script calling claude -p "run tests and fix failures" can be adapted to grok-build -p "run tests and fix failures" with equivalent non-interactive behavior .

The practical framing: Grok Build functions well as a secondary agent in a mixed Claude Code / Grok Build setup rather than a full replacement. Project context shared via CLAUDE.md is immediately available on day one, but tool definitions and MCP configurations need explicit recreation. Teams evaluating Grok Build should plan a one-time audit of what is in shared project files versus what is in Claude Code-specific configs before switching any automated pipeline over.

grok-build-0.1: Context Window, Tool Support, and API Rollout

The grok-build-0.1 model carries a 256,000-token context window and accepts both text and image inputs . The image input capability enables visual debugging workflows: attach a screenshot of a broken UI component and the model can propose CSS or layout fixes with the visual context included rather than relying solely on the code. The 256K window handles most production codebases at moderate file counts, though teams working with very large monorepos at context scale should benchmark against their specific project size.

On SWE-Bench Verified — the standard benchmark for agentic coding, measuring real GitHub issue resolution on open-source Python repositories — grok-build-0.1 scored 70.8% per xAI's internal testing . That sits approximately 15–17 percentage points below Claude Opus 4.7's ~87% on the same benchmark. According to Engadget's coverage, xAI leadership has directed engineering teams to close this gap . The 70.8% figure is self-reported and has not been independently reproduced by third-party evaluators as of this writing — a caveat worth holding when making adoption decisions.

API access to grok-build-0.1 opened on May 20, 2026, with broader rollout completing between May 24–27, 2026 . The model is available via xAI's API directly and through OpenRouter. It supports native tool invocation and reasoning chains tuned for code-related tasks — meaning tool calls can be issued within a reasoning trace rather than requiring explicit interrupts from an orchestrator layer.

For teams that want fine-grained control over context construction, tool availability, and output handling, direct API access to grok-build-0.1 is an option independent of the CLI workflow. The model can be integrated into custom pipelines without adopting Grok Build's opinionated Plan → Search → Build loop.

Pricing and Access: SuperGrok Heavy, API Rates, and What's Not Yet Public

Tokenizer example
Source: x.ai

Grok Build entered early beta on May 14, 2026 , exclusively for SuperGrok Heavy subscribers at $299/month . Access subsequently expanded on May 25, 2026 to all SuperGrok and X Premium+ subscribers . An introductory promotional rate offered SuperGrok Heavy at $99/month for the first six months, reducing the barrier for early evaluation .

Published grok-build-0.1 API token pricing: $1.00 per million input tokens, $2.00 per million output tokens, and $0.20 per million cached input tokens, available via xAI's API and OpenRouter . According to eWeek's coverage, xAI has not confirmed a general availability date or a lower-tier access path beyond the ongoing beta expansion . The rollout from SuperGrok Heavy-only to X Premium+ suggests incremental broadening rather than a hard GA cutoff.

Tool Required Plan Monthly Cost API Rate (input / output)
Grok Build (CLI) SuperGrok Heavy (beta) $299/mo; $99/mo intro (6 mo) Included in subscription
grok-build-0.1 (API only) xAI API / OpenRouter Pay-per-token $1.00/M in · $2.00/M out · $0.20/M cached
Claude Code Claude Max $100/mo Included in Max subscription
Codex CLI ChatGPT Plus $20/mo Included in Plus subscription

At $299/month, Grok Build's standard SuperGrok Heavy rate sits significantly above both Claude Code ($100/month) and Codex CLI ($20/month). The value proposition depends on whether Arena Mode, CLAUDE.md compatibility, and the MCP connector ecosystem justify the premium for your specific workflow. For teams evaluating Grok Build as a secondary agent rather than a primary tool, the introductory $99/month rate substantially reduces the cost-of-experiment. At the standard rate, Grok Build is positioned as a specialist instrument for teams with a defined use case — not a general-purpose daily driver at the current price point.

Frequently Asked Questions

Does Grok Build work with my existing CLAUDE.md project setup?

Yes. Grok Build natively reads CLAUDE.md, CLAUDE.local.md, AGENTS.md, and .claude/rules/ with zero configuration change — your project context files carry over on first run with no migration step required. The compatibility is context-import only: Claude Code-specific tool definitions and local MCP server configurations scoped to a Claude Code session do not transfer and must be reconfigured separately within Grok Build's connector system. Project instructions, coding standards, and agent behavior rules defined in those shared files are fully recognized by Grok Build from day one.

What is Arena Mode and when should I use it?

Arena Mode launches up to 8 concurrent sub-agents — 4 on Grok Code 1 Fast and 4 on Grok 4 Fast — each independently exploring a solution branch. Results are scored by test pass rate, diff size, and plan adherence, then ranked for developer selection. No code is committed without an explicit developer pick from the ranked output. Arena Mode is most useful for ambiguous bugs or design problems where multiple valid solution paths exist, and the cost of committing to the wrong one early is high. For routine feature additions with a clear, single implementation path, it adds latency and token cost without proportional benefit — use standard single-agent mode instead.

How does Grok Build pricing compare to Claude Code and Codex CLI?

In beta, Grok Build requires a SuperGrok Heavy subscription at $299/month, with an introductory rate of $99/month for the first six months offered at launch. Claude Code ships with Claude Max at $100/month. Codex CLI is available to ChatGPT Plus subscribers at $20/month. The grok-build-0.1 API is priced at $1.00 per million input tokens, $2.00 per million output tokens, and $0.20 per million cached input tokens for teams integrating directly via API. At standard rates, Grok Build is the highest-cost option of the three; the introductory pricing changes that calculation for early evaluators.

Can I run Grok Build in a CI/CD pipeline without interactive prompts?

Yes. Use the -p flag to run Grok Build in headless non-interactive mode — it suppresses all approval prompts and runs the agent fully automated within the pipeline step. For more complex orchestration, the Agent Client Protocol (ACP) enables Grok Build as a composable step inside custom multi-agent pipelines with full programmatic control over inputs, context, and outputs. Teams already using Claude Code's -p flag in shell scripts can adapt those scripts directly, as Grok Build's headless flag convention is compatible with the same CI invocation pattern.

What are Grok Skills and how do they differ from saved prompts?

Grok Skills are named, versioned instruction bundles with a defined structure: name, description, instructions block, and optional reference files tied to specific files in your repository. They are invoked via slash commands inside any active Grok Build session. Unlike ad-hoc saved prompts, Skills are version-controlled alongside project code, can reference specific repository files that are always included in context when the Skill runs, and can be shared across a team organization rather than staying local to one developer's session. This makes Skills a first-class software artifact — prompt changes go through code review the same way code changes do, which matters for teams maintaining consistency across multiple developers and sessions.

Evaluating Grok Build: What to Watch Before Committing

Grok Build enters the agentic coding market with a coherent technical identity: parallel sub-agents in Arena Mode, an explicit Plan Mode approval gate, diff-before-write guarantees, and genuine CLAUDE.md portability. These are real differentiators, not marketing positioning. The benchmark gap is also real — the self-reported 70.8% SWE-Bench Verified score versus Claude Opus 4.7's ~87% is a capability difference that matters for complex multi-file refactors and large-scale bug investigations. xAI has reportedly directed engineering resources at closing it, and the connector ecosystem cadence (two waves in 16 days post-launch) indicates an active development velocity.

The most practical adoption path for teams already using Claude Code is hybrid rather than wholesale migration. CLAUDE.md and AGENTS.md context transfers cleanly; tool definitions and local MCP hooks do not. Grok Build as a secondary agent — handling Arena Mode runs for ambiguous production bugs while Claude Code handles everyday flow — maps naturally onto that constraint without requiring a full context rebuild. The -p flag and ACP integration make wiring that kind of hybrid pipeline straightforward.

Watch for: independent SWE-Bench reproductions (the self-reported figure needs third-party confirmation), movement on the $299/month standard rate post-beta, any announcement of internal MCP server support for enterprise teams, and the confirmed general availability date. Until those land, Grok Build is a capable secondary agent worth evaluating at the introductory price — not yet a reason to replace primary tooling.

Last updated: 2026-05-28. Based on xAI official documentation, release notes, and third-party technical reviews published through May 28, 2026. Benchmark figures are from xAI's own testing and have not been independently reproduced as of this date.

Stay in the loop

Field notes on AI tooling, agents, and the protocols connecting them.

Explore Creeta