Anthropic 0.105.0 Under the Hood: Output Attribution and File Caps

v0.105.0 adds granular output-type attribution and configurable upload caps—here's what they do and when to use them.

Creeta

May 29, 2026

Anthropic 0.105.0 Under the Hood: Output Attribution and File Caps

What Shipped in v0.105.0

Anthropic SDK v0.105.0 is a four-feature drop co-shipped with the Claude Opus 4.8 model launch on May 28, 2026 . The four additions are: the claude-opus-4-8 model identifier, mid-conversation system block injection, the usage.output_tokens_details attribution field, and a per-client configurable file upload size cap. Two patch releases followed within 24 hours—v0.105.1 (supply-chain hardening via Trusted Publishing) and v0.105.2 (undocumented scope)—making this a dense two-day window for SDK consumers tracking the package.

Quick Answer: Anthropic SDK v0.105.0 (released May 28, 2026) adds four features co-shipped with Claude Opus 4.8: claude-opus-4-8 model support, mid-task system instruction injection, a new usage.output_tokens_details field for per-type output cost attribution, and a configurable per-client file upload size cap. Pin to anthropic>=0.105.1,<0.106 for reproducible builds.

The library and model are deliberately co-shipped: calling claude-opus-4-8 requires at minimum anthropic==0.105.0 . That pairing is consistent with how Anthropic has handled previous Opus-tier launches—the SDK update and the model availability date are synchronized so that the model string resolves on the day it is announced.

One housekeeping change warrants a codebase search: example scripts under the managed-agents module were renamed from private-sandbox-worker to self-hosted-sandbox-worker, aligning with terminology introduced in v0.103.0 on May 19, 2026 . It is a naming change only—no behavioral difference—but any CI pipeline or Makefile that references the old script name will break silently on upgrade.

Version	Release Date	Type	Key Changes
0.105.0	May 28, 2026	Feature drop	`claude-opus-4-8`, mid-task system injection, `output_tokens_details`, file size cap config, sandbox script rename
0.105.1	May 29, 2026	Supply-chain	Switched PyPI deployment to Trusted Publishing (OIDC); no API surface changes
0.105.2	May 29, 2026	Unknown	No published changelog; safe to upgrade; scope unknown pending Anthropic release notes

Output Usage Observability: The New Breakdown Field

The usage.output_tokens_details field is a new nested object in the Messages API response that breaks down output token consumption by type, mirroring the existing usage.input_tokens_details structure that exposes cache_read_input_tokens and cache_write_input_tokens . Before this field existed, every output token—whether produced by a reasoning chain or a final text response—appeared as a single opaque integer in usage.output_tokens. For pipelines that enable extended thinking, that opacity was a genuine cost-accounting gap.

The expected breakdown separates reasoning (thinking) tokens from standard output tokens. Extended thinking runs the model's internal chain-of-thought before generating a response; those reasoning tokens are billed identically to text output tokens but represent a qualitatively different budget line. With output_tokens_details, a team running a high-volume pipeline can now answer "how much of our monthly output spend is reasoning overhead?" without manual instrumentation or proxy metrics.

"Claude Opus 4.8 has sharper judgment, more honesty about its progress, and the ability to work independently for longer than its predecessors." — Anthropic, Claude Opus 4.8 launch post

The symmetry with the input side is intentional. usage.input_tokens_details lets you track how effectively you're using prompt caching—if cache_read_input_tokens is low, you're paying full price for repeated prefix tokens. The output-side analog closes the loop: every token class that affects your bill now has a dedicated field rather than requiring inference from aggregate totals. High-volume agentic workflows that mix extended thinking with standard completions will see the largest practical benefit.

One important caveat: the exact sub-field names within output_tokens_details are not yet included in the official API reference as of v0.105.0 . Field presence also depends on the response type—if extended thinking is disabled, reasoning-specific sub-fields may be absent or zero rather than guaranteed present. Safe integration pattern: inspect the raw response.usage object in a test environment, use getattr(response.usage, "output_tokens_details", None) for a defensive access, and monitor the SDK CHANGELOG for the schema freeze. Do not hard-code sub-field names in production code until they appear in official docs.

For teams already logging usage metrics with LangSmith, OpenTelemetry, or self-rolled middleware, adding output_tokens_details extraction is a one-line addition to an existing instrumentation hook. The value compounds most in workloads where reasoning depth varies across requests—a queue that mixes quick retrieval calls with deep multi-step planning will show meaningfully different reasoning-to-text ratios once you start logging the breakdown.

Configuring File Upload Size Caps

PR #1825 adds a configuration option to override the SDK's default maximum file size for uploads . The cap is scoped per client instance—set it when you instantiate anthropic.Anthropic(), not globally—so multiple client objects in the same process can carry different limits without interfering with each other.

Two distinct use cases drive this feature. First: large document pipelines. Legal, research, and engineering workflows that upload multi-hundred-page PDFs or dense data exports previously required a pre-chunking or external splitting step before sending files to the API. A higher per-client cap lets those pipelines simplify their preprocessing layer. Second, the inverse: rate-sensitive or resource-constrained environments—edge deployments, sandbox runners, or high-frequency queue workers—where you want to enforce a lower-than-default cap to prevent accidental large uploads from consuming quota or adding unexpected latency spikes.

The per-instance scoping is the key architectural detail. Applications that maintain multiple Anthropic client objects—common in multi-tenant services or parallel pipeline architectures—can assign different upload caps per client without process isolation or custom upload middleware. One client handles large document ingestion; another, running in the same process, enforces a strict cap for high-frequency smaller calls. The exact constructor parameter name is in the PR #1825 merge notes in the SDK CHANGELOG; running help(anthropic.Anthropic) after upgrading will surface it immediately.

Mid-Task Instruction Injection in Practice

Mid-conversation system blocks let you append a system-role entry inside the messages array at any position—not just as the top-level system field at conversation start . The feature targets long-running agentic pipelines where the model's operating constraints need to evolve after the conversation is already underway, without disrupting the prompt cache or routing the update through a fabricated user turn.

The prior workaround was injecting updated instructions through a user-role turn, treating the user role as a side channel for system-level constraints. That approach had two concrete problems. It polluted conversation history with turns that are not genuine user messages, making replays, audits, and fine-tuning data preparation more complex. It also risks confusing the model about the semantic boundary between user requests and system-level constraints—a system instruction injected via the user channel is the wrong abstraction at both the API level and the reasoning level.

The canonical use case is an agentic pipeline that runs a tool call and receives new environmental state in the result. A code execution agent, for example, might receive a runtime error revealing the target environment's Python version. Previously, communicating "adjust for Python 3.11" meant routing it through a user turn or prepending it to the next message's content. With mid-conversation system injection, you append it as a proper system entry after the tool result:

messages = [
    {"role": "user", "content": "Analyze this codebase and suggest refactors."},
    {"role": "assistant", "content": "...", "tool_use": [...]},
    {"role": "tool", "tool_use_id": "...", "content": tool_result},
    # Inject updated constraint after learning the runtime environment
    {"role": "system", "content": "Target runtime is Python 3.11. Avoid 3.12-only syntax."},
]
response = client.messages.create(
    model="claude-opus-4-8",
    messages=messages,
    max_tokens=2048,
)

The open question for production use is prompt cache interaction. The feature is designed so that inserting a system entry mid-conversation does not invalidate cache hits on earlier message blocks—cache eligibility in the Anthropic API depends on prefix matching, and the intent is that only the new system entry and subsequent messages fall outside the cached prefix. This interaction is not formally documented in the available v0.105.0 release notes, however. Before relying on cache preservation in a cost-sensitive pipeline, verify the behavior in staging: log usage.input_tokens_details.cache_read_input_tokens on representative conversations before and after adding a mid-turn system entry to confirm earlier blocks continue receiving cache hits.

Three Versions Over Two Days: 0.105.0 to 0.105.2

The 0.105.x line moved through three versions across two calendar days: 0.105.0 on May 28, 2026 and both 0.105.1 and 0.105.2 on May 29, 2026 . This cadence is typical after a major model launch—the initial drop is timed to model availability, and narrow fixes accumulate in the hours following as the release meets production traffic. No breaking changes were flagged across all three versions .

"Claude Opus 4.8 is available today across the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry." — Anthropic, Claude Opus 4.8 launch post

For teams that need deterministic builds, the practical pin recommendation is anthropic>=0.105.1,<0.106. The >=0.105.1 lower bound excludes 0.105.0, which predates the Trusted Publishing switch. The <0.106 upper bound prevents automatic uptake of the next minor version, which may carry model string additions or API surface changes you haven't evaluated against your test suite. Once Anthropic publishes 0.105.2 release notes and you've run your integration tests against it, advance or relax the pin accordingly.

The one action item that the patch cadence does not handle automatically is the sandbox script rename. Upgrading the package with pip install --upgrade anthropic will not rename script paths referenced in your CI configs, shell scripts, or infrastructure-as-code. Grep for private-sandbox-worker explicitly as part of your upgrade procedure.

Trusted Publishing and the PyPI Deployment Shift

v0.105.1's sole change is switching the anthropic PyPI package deployment from long-lived API tokens to Trusted Publishing—an OIDC-based mechanism where the GitHub Actions workflow itself issues a short-lived identity token for each release. Consumer impact: zero. pip install anthropic==0.105.1 behaves identically to any prior release. The change is entirely on Anthropic's publisher side.

Trusted Publishing eliminates the static PyPI API token that previously had to live in GitHub Actions secrets. Under the old model, a leaked secret would allow an attacker to publish a malicious package version. Under OIDC-based publishing, each release uses a token that is valid for a single workflow run and scoped to a specific repository and workflow file—there is no static credential to steal, rotate, or accidentally commit. The token is minted by the CI environment at release time and cannot be reused.

PyPI Trusted Publishing has been available since 2023 and is now standard practice for major Python packages. Anthropic's adoption in 0.105.1 aligns the SDK with current supply-chain best practice. If you run Software Bill of Materials (SBOM) audits or dependency provenance checks, the 0.105.1 build attestations generated by the GitHub Actions workflow are now accessible via PyPI's provenance API—worth wiring into your supply-chain tooling if you track that level of detail for third-party dependencies.

Upgrading to 0.105.x: Install and Verify

All changes in the 0.105.x line are additive—no existing code breaks on upgrade. The steps below cover installation, a targeted smoke test for the headline features, and the checklist of non-code changes to handle in your repository.

Install:

# Latest patch in the 0.105.x line
pip install --upgrade anthropic

# Pin for reproducible environments
pip install "anthropic>=0.105.1,<0.106"

Smoke test — new model and output_tokens_details:

import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64,
    messages=[{"role": "user", "content": "Ping"}],
)

print(response.usage)
details = getattr(response.usage, "output_tokens_details", None)
if details:
    print("output_tokens_details:", details)
else:
    print("output_tokens_details not present for this response type")

Upgrade checklist:

Run pip install "anthropic>=0.105.1,<0.106" and confirm the installed version with pip show anthropic.
Search the repo for private-sandbox-worker; replace with self-hosted-sandbox-worker in all scripts, Makefiles, and CI configs .
If you use extended thinking, log response.usage.output_tokens_details in a test run and inspect the sub-fields in the raw object.
If you need a custom file upload cap, inspect help(anthropic.Anthropic) for the new constructor parameter and configure it per client instance.
If you currently inject system instructions via fabricated user-role turns, refactor those to use the proper system-role entry in the messages array.
Monitor the SDK CHANGELOG for v0.105.2 release notes and the formal output_tokens_details schema documentation.

Frequently Asked Questions

What does `usage.output_tokens_details` actually contain?

usage.output_tokens_details is a nested object providing per-type attribution for output token consumption—most likely separating reasoning (thinking) tokens, generated during extended thinking, from standard text output tokens. It mirrors how usage.input_tokens_details breaks down cache-read and cache-write input token counts. The exact sub-field names are not yet published in the official API reference . To inspect the live schema, send a request with extended thinking enabled, capture the raw response.usage object, and print it. Field presence may vary by response type. Use a defensive getattr(response.usage, "output_tokens_details", None) access pattern in production code until the schema is formally documented.

How do I call claude-opus-4-8 in Python after upgrading?

After upgrading to anthropic>=0.105.0, pass model="claude-opus-4-8" in your client.messages.create() call—no other changes are required for the hosted Claude API. Claude Opus 4.8 is also available via Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry ; no endpoint URL changes are needed for the hosted API. Standard mode pricing is $5 per million input tokens and $25 per million output tokens; fast mode is $10 per million input and $50 per million output . For full model capabilities and benchmark data, see the Claude Opus model page.

What changed in v0.105.1 and v0.105.2?

v0.105.1, released May 29, 2026 , switched PyPI deployment to Trusted Publishing—a supply-chain improvement that replaces long-lived API tokens with OIDC-based identity. There are no API surface changes; the package is functionally identical to 0.105.0. v0.105.2, also released May 29, 2026 , has no published changelog as of this writing. It is safe to upgrade to 0.105.2—no breaking changes were flagged—but treat it as an unverified internal fix until Anthropic publishes release notes for it.

Does injecting a system entry mid-conversation break prompt cache hits?

The feature is designed so that inserting a new system-role entry into the messages array does not invalidate prompt cache hits on earlier conversation blocks. Cache eligibility in the Anthropic API is determined by exact prefix matching, and the design intent is that only the new system entry and subsequent messages fall outside the cached prefix. This interaction is not formally documented in the v0.105.0 release notes, however. Before deploying this pattern in a cost-sensitive pipeline, verify in staging: log usage.input_tokens_details.cache_read_input_tokens on identical conversations with and without a mid-turn system entry. Confirm that earlier blocks continue receiving cache hits before relying on cache preservation in production.

How do I set a custom file upload size limit per client?

Pass the size cap as a constructor argument when instantiating anthropic.Anthropic()—the exact parameter name is in the PR #1825 merge notes in the SDK CHANGELOG. Because the cap is scoped per client instance, multiple client objects in the same Python process can carry different limits without interfering with each other—a high-cap client for large document ingestion and a low-cap client for a rate-constrained queue worker can coexist cleanly in one application. The limit is a client-level configuration, not a per-call parameter, so it applies uniformly to every upload made through that client instance.

What to Track as 0.106.x Approaches

The 0.105.x release sets up several threads worth monitoring. The usage.output_tokens_details schema is the most time-sensitive: once Anthropic publishes the sub-field names in official docs, teams should update logging and cost-attribution pipelines. The Trusted Publishing switch in 0.105.1 is a one-time event, but it means the SDK's build attestations are now verifiable via the PyPI provenance API—a useful signal for any team running SBOM or supply-chain audits on their Python dependency graph.

On the model side, Claude Opus 4.8 posts notable benchmark gains: SWE-bench Pro at 69.2% (up from 64.3% in Opus 4.7 ), SWE-bench Verified at 88.6% , and USAMO 2026 math at 96.7% . The long-context GraphWalks F1 at 1M tokens reaches 68.1% , up from 40.3% in the prior version—a gain that pairs directly with the mid-task instruction injection feature for extended autonomous runs. For teams on Opus 4.7, the upgrade path is a single model string change; the SDK handles the rest.

The mid-conversation system block feature is likely to see growing adoption as agentic patterns mature. Teams building tool-use workflows benefit from defining an injection strategy early: which pipeline stages are authorized to append system entries, what format constraints apply, and how to verify cache behavior in staging. Establishing that convention before the pattern proliferates across a codebase is substantially easier than retrofitting it later.

Last updated: 2026-05-29. Written against the v0.105.0, v0.105.1, and v0.105.2 release notes and the Claude Opus 4.8 launch announcement. Sub-field schema for usage.output_tokens_details and the v0.105.2 changelog are pending official documentation from Anthropic.