What grok-build-0.1 Is — and Where It Sits in the Market
grok-build-0.1 is xAI's first model purpose-built for agentic software engineering — autonomous multi-file refactoring, end-to-end debugging, and codebase-level reasoning — rather than conversational assistance . It is the same model powering the Grok Build CLI, xAI's Rust-based terminal coding agent, which the company positioned alongside Anthropic's Claude Code and OpenAI's Codex CLI. The model entered public API beta on May 28, 2026 carrying a 256,000-token context window — large enough to ingest a mid-sized codebase in a single pass. Input can be text or images, so UI mockups, architecture diagrams, and error screenshots are all valid inputs alongside source code.
Four capabilities defined the launch feature set, each with direct implications for production agent pipelines:
- Native function calling — structured tool dispatch inside agent loops without prompt-engineering workarounds
- Structured JSON output — parsing-ready responses for pipelines that consume model output programmatically
- Reasoning tokens — stepwise chain-of-thought exposure, useful for debugging why the model made a particular code decision
- MCP (Model Context Protocol) support — existing MCP server configurations from Claude Code or other compatible agents can be reused without rewiring
xAI describes the model's intended scope in their launch documentation:
"grok-build-0.1 is purpose-built for agentic software engineering workflows — designed to autonomously complete real engineering tasks end to end, reading codebases, writing and executing code, running terminal commands, and iterating on failures, rather than responding to individual prompts." — xAI Newsroom, May 2026
One naming detail worth confirming before deploying: xAI's API documentation lists grok-code-fast-1, grok-code-fast, and grok-code-fast-1-0825 as internal identifiers for the same underlying model . The public branding name (grok-build-0.1) diverges from these internal aliases. For production deployments, hardcode the public string and track the xAI changelog — internal aliases are more likely to rotate without notice.
From $299 CLI-Only to Open API Access
Before May 28, 2026, grok-build-0.1 was accessible only through the Grok Build CLI, which required a SuperGrok Heavy subscription priced at approximately $299/month . That put the full model experience out of reach for most individual developers and early-stage teams — a significant limitation for any coding agent competing in a market where Claude Code ships bundled with a $20/month plan. The public API beta changes the calculus by decoupling the model from the subscription gate entirely.
The rollout followed a three-phase staged timeline. xAI opened the Grok Build CLI in early beta on May 14–15, 2026 , giving initial adopters their first look at the terminal agent. API early access followed on May 22, 2026 , expanding programmatic access to developers who wanted to call the model directly. Six days later the public beta opened to any developer with an xAI API key .
The practical effect: developers can now embed grok-build-0.1 in custom agent loops, IDE extensions, CI workflows, and orchestration pipelines without a premium subscription. This is the per-token access model developers expect from Anthropic and OpenAI. Third-party integrations confirmed at launch include Cursor, Kilo Code, OpenCode, OpenRouter, and Vercel AI Gateway . Same-day coverage across major coding IDEs and AI routing layers suggests the integration story was coordinated rather than left to organic adoption.
The SuperGrok Heavy subscription is not retired — it remains the gate for the full Grok Build CLI terminal experience with its interactive TUI. But the API-only path is now open and priced to compete, and that is the path most teams building programmatic coding agents will use in practice.
Token Costs, Rate Limits, and Regional Availability
grok-build-0.1's API pricing at public beta launch is $1.00/M input tokens, $2.00/M output tokens, and $0.20/M for cached input tokens . The input price undercuts Claude 3.5 Sonnet and GPT-4o at API list rates. The table below places these numbers in context against the three main alternatives — note that competitor pricing has its own publish cadence and should be verified against vendor documentation before budgeting.
| Model | Input ($/M tokens) | Output ($/M tokens) | Cached Input ($/M) | Context Window |
|---|---|---|---|---|
| grok-build-0.1 | $1.00 | $2.00 | $0.20 | 256k |
| Claude 3.5 Sonnet | $3.00 † | $15.00 † | $0.30 † | 200k |
| GPT-4o | $2.50 † | $10.00 † | $1.25 † | 128k |
| Gemini 1.5 Pro | $1.25 † | $5.00 † | — | 1M–2M |
† Competitor pricing per vendor documentation; verify current rates before budgeting. grok-build-0.1 prices sourced from xAI model documentation.
Rate limits at launch are 1,800 requests per minute and 10 million tokens per minute . Those ceilings are high enough for most single-team use cases. If you are routing aggregate traffic from multiple downstream users through a single API key, check whether the per-key or per-org limit applies before assuming headroom. Regional availability at public beta is limited to us-east-1 and eu-west-1 . Latency-sensitive workloads originating from Asia-Pacific or other regions will route through one of these two endpoints — worth a latency benchmark against your actual workload before committing.
Content moderation is disabled by default . The reasoning is practical: code-generation pipelines routinely handle credential references, authentication logic, and vulnerability descriptions that standard moderation filters flag incorrectly. Developers who need moderation — for example, when exposing the model to untrusted user input in a product context — can enable it explicitly via API parameters. If your pipeline processes code containing API keys, private schemas, or sensitive configuration, audit the opt-in moderation behavior against your specific content types before shipping.
xAI claims 100+ tokens per second inference speed . No independently verified latency data had been published at launch. For latency-sensitive applications, measure against your actual workload rather than accepting the vendor figure.
SDK Compatibility and Integration Paths
grok-build-0.1 is accessible via the xAI native API and carries compatibility with both the OpenAI SDK and the Vercel AI SDK . OpenAI SDK compatibility is the most significant for existing teams: if you already have an OpenAI client in your codebase, switching to grok-build-0.1 requires changing exactly two values — the base_url and the model string. No SDK reinstall, no dependency change, no interface rewrite.
The minimal before/after migration looks like this:
# Before — standard OpenAI client
from openai import OpenAI
client = OpenAI(
api_key="sk-...",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Refactor this function for readability."}]
)
# After — same SDK, pointed at xAI with grok-build-0.1
from openai import OpenAI
client = OpenAI(
api_key="xai-...", # Your xAI API key
base_url="https://api.x.ai/v1", # xAI endpoint
)
response = client.chat.completions.create(
model="grok-build-0.1", # Updated model string
messages=[{"role": "user", "content": "Refactor this function for readability."}]
)
Vercel AI SDK support is relevant for teams deploying Next.js applications or edge functions with LLM capabilities. The Vercel integration handles streaming, error retry, and token accounting in a format most Next.js developers already have in place. Additional integration examples across multiple SDK targets are available in the Puter developer reference for xAI grok-build-0.1.
One deployment caution: xAI's documentation surfaces both the public model identifier (grok-build-0.1) and internal aliases including grok-code-fast-1 and grok-code-fast-1-0825 . Use the public-facing string in production code. Internal identifiers are more likely to rotate silently as xAI iterates toward GA, creating a breakage risk in anything that hardcodes them.
Grok Build CLI: TUI, Headless Scripting, and ACP
The Grok Build CLI is the terminal-native interface xAI ships alongside the model. Written in Rust, it exposes three distinct access surfaces: an interactive TUI for session-style coding work, a headless mode for scripted single-shot automation, and an Agent Client Protocol (ACP) for embedding the model into external orchestration systems. Each surface has different authentication requirements and cost implications .
Installation is a single-command operation. On macOS and Linux, a curl script handles the full install. On Windows, a PowerShell one-liner covers the same path. Full install commands are in the Grok Build CLI documentation. The Rust binary brings fast cold starts and low memory overhead compared to Node- or Python-based CLI wrappers — relevant if you are running the agent as a sidecar in constrained environments.
The headless mode is well-suited to CI integration or scripted refactoring runs:
# Headless single-shot command — no interactive session required
grok -p "Refactor the auth module to use async/await throughout"
That invocation runs the agent, applies changes, and returns — fitting neatly into a pipeline step triggered by a CI event or a webhook without requiring a persistent terminal session.
One subscription-gate distinction to hold in mind: the interactive TUI still requires a SuperGrok Heavy subscription . The public API beta does not remove that gate for developers who want full terminal sessions. ACP is the exception: because it routes through the API layer rather than the CLI subscription layer, it lets developers embed grok-build-0.1 into custom orchestration loops without the $299/month requirement. If you want the interactive terminal experience, budget for the subscription. If you want programmatic access inside an orchestrator, the API and ACP path is significantly cheaper.
Third-Party Performance Scores: What Kilo.ai Measured
At public beta launch, xAI had not submitted grok-build-0.1 to peer-reviewed coding benchmarks such as SWE-bench (verified) or HumanEval+. The most substantive independent evaluation available comes from Kilo.ai's PinchBench v2, which places the model at 88.9% overall — #4 among 50 models tested . PinchBench evaluates multi-step agentic coding tasks rather than one-shot completions, making it a closer proxy for production coding agent performance than standard completion benchmarks.
| Benchmark / Category | grok-build-0.1 Score | Source | Notes |
|---|---|---|---|
| PinchBench v2 (overall) | 88.9% (#4 of 50) | Kilo.ai | Third-party, multi-step agentic eval |
| Log Analysis | 97.0% | Kilo.ai | Category leader |
| CSV Analysis | 96.1% | Kilo.ai | Category leader |
| SWE-bench (verified) | Not published | — | No xAI submission at launch |
| HumanEval+ | Not published | — | No xAI submission at launch |
| xAI internal eval suite | Vendor-reported | xAI | Not independently audited |
The PinchBench evaluation methodology involves multi-step task execution, which is reflected in the run economics: the average cost per benchmark task was $20.58 and the average task execution time was 220 minutes . These are not per-request latency numbers — they describe the full cost and wall-clock time to complete a single agentic evaluation task involving multiple tool calls and iteration cycles. Treat them as a signal about evaluation methodology, not API response speed.
Kilo.ai's evaluation report summarizes the model's performance profile:
"grok-build-0.1 achieves top-tier category scores in Log Analysis and CSV Analysis, demonstrating consistent accuracy on structured data interpretation tasks across runs. Multi-step reasoning tasks with complex tool interdependencies show more variance across the broader model set." — Kilo.ai PinchBench v2 Evaluation Report, 2026
The missing peer-reviewed benchmarks matter for teams making model selection decisions against a competitive shortlist. SWE-bench verified scores are the current standard for auditable coding agent comparisons. Until xAI or an independent research group submits grok-build-0.1 results, direct apples-to-apples comparisons with Claude 3.5 Sonnet, GPT-4o, or Gemini 1.5 Pro on widely referenced coding leaderboards will remain gaps in the evaluation record.
Unknowns and Risks Before Committing to grok-build-0.1
The public beta label carries real weight here. Current specifications — model IDs, rate limits, endpoint URLs, pricing — are provisional. xAI has not published a formal deprecation or change-notice policy for public beta resources, so treating any of these as stable infrastructure today increases breakage risk proportional to how deeply you embed them. Several practitioners have flagged discrepancies between launch claims and what primary vendor documentation actually shows, including context window specifications and Plan Mode defaults . Verify critical specifications against the xAI model documentation directly, not against third-party launch summaries.
The subscription cost structure deserves explicit attention for teams evaluating the full CLI workflow. SuperGrok Heavy runs approximately $299/month — roughly 15× the cost of Claude Pro at $20/month or ChatGPT Plus at $20/month , both of which include their respective CLI coding agents. xAI has made no announcement about aligning the CLI subscription price with competitors. If your workflow is API-driven orchestration, the per-token pricing is competitive. If you want the full interactive terminal session, that cost premium is a real constraint, not a footnote.
A summary of what to verify before committing:
- Model ID stability —
grok-build-0.1is the public string; internal aliases may rotate before GA - Benchmark gaps — No SWE-bench verified or HumanEval+ results from xAI at launch; only third-party PinchBench and vendor-reported internal scores available
- Regional latency — Only us-east-1 and eu-west-1 available; latency profile for other regions is untested at launch scale
- Disputed launch specs — Practitioner analysis notes that context window claims and Plan Mode behavior in some configurations differed from vendor documentation
- Moderation defaults — Disabled by default; evaluate the opt-in path if your application surfaces the model to untrusted input
Frequently Asked Questions
What is the difference between grok-build-0.1 and the Grok Build CLI?
grok-build-0.1 is the underlying model — an API-accessible LLM purpose-built for agentic software engineering tasks. The Grok Build CLI is xAI's Rust-based terminal client that uses that model. Before May 28, 2026, the only way to access grok-build-0.1 was through the CLI, which required a SuperGrok Heavy subscription (~$299/month). The public API beta, launched May 28, 2026, lets any developer call the model directly via API key with no subscription required. The CLI's interactive TUI still requires the SuperGrok Heavy subscription. The API path — and the Agent Client Protocol (ACP) — does not.
Is grok-build-0.1 compatible with the OpenAI SDK?
Yes. grok-build-0.1 works as a drop-in replacement in the OpenAI Python and JavaScript SDKs. To switch an existing OpenAI client, set base_url="https://api.x.ai/v1" and update the model string to "grok-build-0.1". No SDK reinstall or interface change is required. The Vercel AI SDK is also supported, which lowers the bar for Next.js and edge-deployed applications. Full SDK integration details are in the xAI developer documentation.
How much does grok-build-0.1 cost per million tokens?
At public beta launch: $1.00/M input tokens, $2.00/M output tokens, and $0.20/M for cached input tokens. Rate limits are 1,800 requests per minute and 10 million tokens per minute. The model is currently available in two regions: us-east-1 and eu-west-1. These figures are public beta pricing and may be revised before general availability. Check xAI's model documentation for current rates before budgeting production workloads.
Has grok-build-0.1 been evaluated on SWE-bench or HumanEval?
Not by xAI at public beta launch. Available evaluation data comes from two sources: Kilo.ai's PinchBench v2 (88.9% overall, #4 of 50 models tested, with category-leading scores in Log Analysis at 97.0% and CSV Analysis at 96.1%) and xAI's own internal evaluation suite, which has not been independently audited. No SWE-bench verified or HumanEval+ results were published by xAI at launch. For teams that rely on standard leaderboard comparisons when making model selection decisions, that gap is a known unknown until xAI or an independent lab publishes results on a peer-reviewed coding benchmark.
Why is content moderation disabled by default?
xAI ships grok-build-0.1 with moderation off by default because standard content filters frequently produce false positives on code-related content — particularly code that references internal credentials, handles authentication flows, or describes security vulnerabilities. Those are legitimate engineering topics. A general-purpose moderation layer often blocks them incorrectly, which would degrade the model's usefulness for its primary purpose. Developers who need moderation — for example, when exposing the model to untrusted user input in a product — can enable it explicitly via API parameters. If you are building a user-facing product on top of grok-build-0.1, enable moderation and test its behavior against your content types before launch.
Evaluation and Next Steps
grok-build-0.1's public beta is a coherent entry in the agentic coding model market. The API pricing is competitive against Claude 3.5 Sonnet and GPT-4o at list rates. The OpenAI SDK compatibility reduces integration friction to a two-line change for most existing teams. MCP support gives projects already using Claude Code a path to reuse their existing tool configurations without rewiring. The 256k context window is a practical advantage for large-codebase tasks where context overflow creates persistent problems. And the same-day distribution through Cursor, OpenRouter, and Vercel AI Gateway suggests the integration surface was deliberately prepared for the launch date.
The gaps are equally clear. No SWE-bench or HumanEval+ results from xAI at launch means direct, auditable comparisons with top-tier coding models remain incomplete. The beta label is real — model IDs, rate limits, and pricing should be treated as provisional until GA. Practitioner analysis has flagged discrepancies between some launch claims and primary vendor documentation, which argues for validating specs independently rather than building against announcement copy. And the $299/month SuperGrok Heavy subscription creates a steep cost floor for developers who want the full terminal TUI experience — an order of magnitude above competing CLI tools.
The practical path: evaluate grok-build-0.1 via the API for teams doing programmatic coding automation. The OpenAI SDK migration path is low-friction enough to run a structured test against your actual workloads without significant time investment. If Kilo.ai's PinchBench category scores hold for your task types — structured log parsing, CSV processing, function-level refactoring — the pricing makes sense. Treat it as infrastructure only after GA and independent benchmark data are both available.
Last updated: 2026-05-29. Article reflects grok-build-0.1 public API beta specifications as announced on May 28, 2026. Pricing, rate limits, model IDs, and regional availability are subject to change before general availability. Verify all figures against xAI's current model documentation before deploying to production.



