What use_responses_api Does in ChatPerplexity
use_responses_api is a new constructor parameter on the ChatPerplexity class, shipped in langchain-perplexity 1.3.0 via PR #37359 . When set, it routes calls through Perplexity's Agent API — canonical endpoint /v1/agent, aliased at /v1/responses — rather than the standard Chat Completions endpoint at /v1/chat/completions. The pattern mirrors the existing use_responses_api parameter on ChatOpenAI, making it immediately recognizable to LangChain practitioners. The default value is None, which activates auto-detection; existing ChatPerplexity(model="sonar") callers see zero behavior change without opting in.
use_responses_api on ChatPerplexity routes LangChain calls to Perplexity's Agent API (/v1/agent) instead of Chat Completions. It defaults to None (auto-detect) so existing code is unaffected. Set to True to unlock four built-in tools, stateful multi-turn fields, cross-provider model access, and richer response metadata — all under a single Perplexity billing account.
Under the hood, 1.3.0 introduces two internal conversion layers to bridge LangChain's message format with the Agent API schema. _to_responses_payload renames messages → input and max_tokens → max_output_tokens to match what the Agent API expects. The paired _convert_responses_to_chat_result wraps the Agent API response object back into an AIMessage, preserving usage metadata and citations. A third helper, _convert_responses_stream_event_to_chunk, handles streaming. Downstream steps in a LangChain chain — tools, memory, output parsers — receive a standard AIMessage and are unaware of which endpoint served the request.
from langchain_perplexity import ChatPerplexity
# Opt in explicitly — always uses Agent API
llm = ChatPerplexity(model="sonar-pro", use_responses_api=True)
response = llm.invoke("What are the latest changes to the Federal Reserve's rate policy?")
# Agent API metadata available on additional_kwargs
citations = response.additional_kwargs.get("citations", [])
search_results = response.additional_kwargs.get("search_results", [])
One set of constraints to understand before switching an existing chain: the Agent API does not accept Chat-Completions-only sampling controls. When a call is routed through the Agent API, temperature, top_p, top_k, stop, and metadata are silently dropped. Passing tool_choice raises a ValueError. Structured outputs via .with_structured_output() are available only for tier 3+ Perplexity users. If your production chain relies on any of these, test routing behavior before upgrading.
Routing Behavior: Auto-Detect, Force-On, Force-Off
The three-mode design of use_responses_api gives developers precise control over which Perplexity endpoint handles each call. According to the LangChain ChatPerplexity integration docs , setting use_responses_api=None (the default) enables payload inspection on each request: the runtime checks for the presence of a built-in tool or any Agent-only field. If detected, the call goes to the Agent API; otherwise it falls through to Chat Completions. This means most existing chains — those using Sonar models with no agentic features — continue to hit Chat Completions without any change.
use_responses_api value |
Routing behavior | When to use |
|---|---|---|
None (default) |
Inspects each request payload. Routes to Agent API if a built-in tool or any of previous_response_id, instructions, input, or include is present; falls through to Chat Completions otherwise |
Most cases — lets individual calls self-select the correct endpoint without changing the constructor or caller code |
True |
Always routes to the Agent API, regardless of payload content | When all calls need Agent API features (e.g., web_search or a third-party model) and you want to eliminate per-request auto-detect logic |
False |
Always routes to Chat Completions, regardless of payload content | When you explicitly want to exclude Agent API features — useful for cost control, latency comparison, or testing Chat Completions behavior in isolation |
The auto-detect logic treats the Agent-only fields as a set: previous_response_id, instructions, input, and include. Presence of any one of them is sufficient to trigger Agent API routing — no explicit constructor flag needed. This means a developer can opt in to stateful behavior simply by passing previous_response_id in their payload, and routing resolves correctly without touching the ChatPerplexity instantiation.
Debug logging records the routing decision on each call. If a chain is hitting the Agent API when you expected Chat Completions (or vice versa), enabling debug-level logging surfaces the branch taken without requiring additional instrumentation. This is the fastest diagnostic path when routing behavior is unexpected — for instance, when an unintended Agent-only field is present in a structured payload that your chain constructs dynamically.
If you need different parameter sets for Agent API calls and Chat Completions calls in the same application, the safest approach is to instantiate two separate ChatPerplexity objects — one with use_responses_api=True and one with use_responses_api=False — rather than relying on auto-detect to split traffic between them. Auto-detect is reliable for homogeneous chains; for mixed chains, explicit flags eliminate ambiguity.
Built-in Tools: web_search, fetch_url, finance_search, people_search
The Agent API exposes four built-in tools that now flow natively through ChatPerplexity when routed to /v1/agent. These tools are defined server-side by Perplexity and require no custom tool wrappers, no external HTTP clients, and no additional API keys. The Agent API itself reached general availability in February 2026 . The fourth tool, finance_search, reached GA in May 2026 — the same month as the 1.3.0 release, making both capabilities available simultaneously.
| Tool | What it returns | GA date | Typical use in a chain |
|---|---|---|---|
web_search |
Structured web results with citations, ranked by relevance; real-time queries supported | February 2026 | Grounded Q&A chains, RAG augmentation, news monitoring agents |
fetch_url |
Parsed content of an arbitrary URL; HTML stripping handled server-side | February 2026 | Link summarization, content extraction steps in multi-step agents |
finance_search |
Structured financial data: quotes, earnings, analyst estimates, ETF constituents, segment KPIs for public companies | May 2026 | Financial research agents, earnings monitors, portfolio analysis tools |
people_search |
Person-focused structured results aggregating professional and public data | February 2026 | Lead research, due diligence chains, executive background lookups |
To use a built-in tool, pass the tool descriptor in the standard tools argument when constructing or invoking ChatPerplexity. With use_responses_api=None, the presence of a tool in the payload is itself sufficient to trigger Agent API routing — no separate flag needed. Structured result objects for each tool are returned in response.additional_kwargs on the resulting AIMessage, where they can be parsed and passed to downstream steps. See the langchain-perplexity tools reference for the full tool schema definitions.
web_search is the most broadly applicable of the four. It returns structured result objects rather than raw text, which makes it suitable for citation display, follow-up queries, and downstream filtering without string parsing. For agents that need to retrieve content from a specific URL as part of their workflow, fetch_url removes the need for an external HTTP client or a separate retriever — Perplexity's infrastructure handles the fetch and returns parsed content.
finance_search is the most specialized. It returns machine-readable structured data — analyst estimates, earnings history, ETF holdings, per-segment KPIs — for public companies. This is a category of data that typically requires either a dedicated financial data API subscription or brittle scraping. For teams building financial research agents with LangChain, the simultaneous GA of finance_search and the 1.3.0 release means structured financial retrieval is available immediately at the Perplexity API tier, without an additional data provider contract.
from langchain_perplexity import ChatPerplexity
# finance_search triggers Agent API routing via auto-detect
llm = ChatPerplexity(model="sonar-pro")
response = llm.invoke(
"What are the latest earnings estimates for NVIDIA?",
tools=[{"type": "finance_search"}]
)
# Structured financial data in additional_kwargs
print(response.additional_kwargs.get("search_results", []))
Response Enrichment: Citations, Images, and Search Results
Agent API responses carry significantly richer metadata than Chat Completions responses. All enrichment fields are accessible via response.additional_kwargs on the returned AIMessage, which means downstream steps in a LangChain chain can access citations, search results, images, and reasoning traces without an additional API call. The full set of available fields is: citations, images, related_questions, search_results, videos, and reasoning_steps.
For a developer building a citation display layer, additional_kwargs["citations"] provides source URLs and titles derived from the model's web retrieval, without requiring text parsing to extract sources. For applications that surface search evidence alongside generated answers, search_results gives the ranked results that informed the response. related_questions can drive follow-up suggestion UI with no additional prompting — the Agent API generates these as a byproduct of the search process.
response = llm.invoke("Explain the latest LangChain release")
# Access enrichment fields
citations = response.additional_kwargs.get("citations", [])
images = response.additional_kwargs.get("images", [])
reasoning = response.additional_kwargs.get("reasoning_steps", [])
related = response.additional_kwargs.get("related_questions", [])
Returning specific fields on demand is controlled by the include parameter — an Agent-only field that accepts an explicit list of response fields to embed in the returned object. Passing include=["citations", "search_results"] limits the response payload to those two fields, which reduces payload size for latency-sensitive applications. As noted in the routing section, passing include in a payload is itself sufficient to trigger auto-detect routing to the Agent API when use_responses_api=None.
reasoning_steps is particularly useful for debugging and evaluation. It exposes the intermediate steps the model used to arrive at its answer, providing a trace without additional instrumentation. For teams building evals or wanting transparency into how the Agent API resolves multi-step queries, this field offers a structured audit trail. It is only present when the call was routed through the Agent API — Chat Completions responses do not include it.
Cross-Provider LLMs Through a Single Billing Account
Beyond Perplexity's own Sonar models, the Agent API supports third-party foundation models from Anthropic, OpenAI, Google, NVIDIA, and xAI. With use_responses_api=True, a developer can pass a provider-prefixed model string to ChatPerplexity and have Perplexity's infrastructure inject real-time web search into that model's responses — all billed under a single Perplexity API account, with no separate API keys per provider required.
As of the 1.3.0 release, the Agent API supports the following cross-provider models: OpenAI's openai/gpt-5.4 and openai/gpt-5.5, added April 2026 ; Anthropic's anthropic/claude-sonnet-4-6 and anthropic/claude-opus-4-7; Google's google/gemini-3-1-pro; NVIDIA Nemotron; and Grok 4.20 Reasoning. The model name format is Perplexity-specific routing syntax — the prefix (e.g., anthropic/) tells the Agent API which provider to call. Treat these strings as potentially version-pinned; as model releases evolve, the specific suffix identifiers may change.
from langchain_perplexity import ChatPerplexity
# Claude Sonnet 4.6 via Perplexity, with live web search injected
# No Anthropic API key required — single Perplexity billing account
llm = ChatPerplexity(
model="anthropic/claude-sonnet-4-6",
use_responses_api=True
)
response = llm.invoke("What happened in AI tooling this week?")
citations = response.additional_kwargs.get("citations", [])
A concrete scenario: a team that has already standardized on Perplexity for search and wants to evaluate different foundation models on the same grounded-response task can do so by swapping the model string on a single ChatPerplexity instance, without managing Anthropic, OpenAI, or Google API keys separately. For teams running A/B evaluations across model providers with live search as a shared baseline, this reduces key management overhead to a single credential.
The primary trade-off is that you are bound to Perplexity's integration cadence for third-party models. When Anthropic or OpenAI releases a new model version, availability through the Agent API depends on Perplexity's timeline, not the provider's release date. Teams with strict model version requirements or who need same-day access to provider releases may prefer direct provider APIs. Teams that want Perplexity's search injection and are flexible on model version will find the consolidated billing a practical convenience.
Stateful Conversations via previous_response_id
Three Agent-only fields enable stateful and scoped behavior that has no equivalent in Chat Completions: previous_response_id, instructions, and include. Each serves a distinct purpose, and passing any one of them is sufficient to trigger auto-detect routing when use_responses_api=None — no constructor flag required.
previous_response_id is the mechanism for multi-turn memory across separate API requests without re-transmitting the full conversation history. Rather than accumulating all prior messages in the input array on each turn, the caller stores the ID from the last response and passes it in the next request. The Agent API retrieves prior context server-side. This reduces payload size and simplifies client-side state management — the client tracks a single ID rather than a growing message list. For agents handling extended conversations, this is a practical optimization for both bandwidth and token costs.
llm = ChatPerplexity(model="sonar-pro") # use_responses_api=None (auto-detect)
# First turn
response_1 = llm.invoke("What is LangChain's latest major version?")
response_id = response_1.additional_kwargs.get("id")
# Second turn — passing previous_response_id triggers Agent API routing automatically
response_2 = llm.invoke(
"What breaking changes did it introduce?",
previous_response_id=response_id
)
instructions is a system-level behavior prompt scoped to the Agent API call, defined as a dedicated field rather than a typed message in the input array. It is functionally similar to a system message, but the separation makes it more explicit and avoids the ambiguity of embedding a system-role message object inside a messages list that gets renamed to input during conversion.
include accepts an explicit list of response fields to embed in the returned object — for example, ["citations", "search_results"]. This is useful for applications that need specific metadata and want to keep response payloads predictable in size. All three fields are meaningful only when the call is routed through the Agent API; they have no effect on and may cause errors against the Chat Completions endpoint.
Dependency Bumps: 1.3.0 and 1.3.1
Both versions were released on May 27, 2026 . Version 1.3.0 shipped at 00:22 UTC with the use_responses_api feature and four dependency floor bumps: langsmith 0.8.0 → 0.8.5, idna 3.10 → 3.15, urllib3 2.6.3 → 2.7.0, and langchain-core ≥ 1.3.3. Version 1.3.1 followed at 20:45 UTC as a pure dependency patch — no new public API surface — bumping the perplexityai Python SDK from 0.34.0 to 0.34.1 via PRs #37710 and #37720.
The specific changes in perplexityai 0.34.1 are not surfaced in the LangChain changelog. Given that both releases shipped the same calendar day, there is no reason to pin to 1.3.0 specifically — pin to 1.3.1 to get the SDK fix alongside the new flag.
If you are upgrading from an earlier version, check your pinned versions of langchain-core and langsmith. The 1.3.0 floor bumps may require coordinated upgrades if you have tight version constraints elsewhere in your dependency tree. The idna and urllib3 bumps are security and maintenance updates with no API surface changes.
Frequently Asked Questions
Does upgrading to langchain-perplexity 1.3.0 break existing ChatPerplexity usage?
No. The use_responses_api parameter defaults to None, which activates auto-detection that falls through to Chat Completions when no Agent-only fields (previous_response_id, instructions, input, include) or built-in tools are present in the request payload. Existing ChatPerplexity(model="sonar") callers continue to hit the Chat Completions endpoint without any code changes. The 1.3.0 release is fully backwards-compatible; no opt-out is required.
What is the difference between the Chat Completions and Agent API endpoints in ChatPerplexity?
Chat Completions (/v1/chat/completions) handles standard augmented chat using Perplexity's Sonar model family. The Agent API (/v1/agent, aliased at /v1/responses) adds four built-in tools (web_search, fetch_url, finance_search, people_search), stateful fields (previous_response_id, instructions, include), support for third-party provider models (Anthropic, OpenAI, Google), and richer response metadata including citations, search_results, images, and reasoning_steps. The Agent API reached general availability in February 2026 ; Chat Completions predates it. Sampling parameters like temperature and top_p are available on Chat Completions but silently dropped on the Agent API.
How does finance_search differ from web_search in the Perplexity Agent API?
finance_search returns structured financial data — stock quotes, earnings history, analyst estimates, ETF constituents, and per-segment KPIs — for public companies. web_search returns general web results ranked by relevance with citations. The distinction matters for financial applications: finance_search produces machine-readable structured output that does not require parsing model text, while web_search returns general web content. finance_search reached general availability in May 2026 ; web_search has been available since the Agent API GA in February 2026.
Can I run Claude or GPT-5 through ChatPerplexity and get live web search injected?
Yes. Pass a provider-prefixed model name — for example, model="anthropic/claude-sonnet-4-6" or model="openai/gpt-5.4" — to ChatPerplexity with use_responses_api=True. Perplexity's Agent API routes the call to the specified provider and injects real-time web search context into the response. All of this runs under a single Perplexity billing account; no separate Anthropic or OpenAI API keys are required. Note that these model name strings are Perplexity-specific routing syntax and should be treated as potentially subject to change as model versions evolve.
What does the _to_responses_payload conversion layer do in 1.3.0?
_to_responses_payload is an internal method in ChatPerplexity that translates the standard LangChain messages payload to the Agent API's expected schema. Concretely, it renames the messages key to input and max_tokens to max_output_tokens. The paired method _convert_responses_to_chat_result performs the reverse: it wraps the Agent API response object into an AIMessage, preserving usage metadata and citations so the rest of a LangChain chain receives a standard message type and remains unaware of which endpoint was used. A third method, _convert_responses_stream_event_to_chunk, handles streaming event conversion.
Upgrade Checklist and What to Watch
The 1.3.0 release makes ChatPerplexity a more complete option for LangChain agents that need live data retrieval. The four built-in tools, cross-provider model support, and stateful fields together close gaps that previously required custom tool wrappers or separate API integrations. For teams building financial data agents, the coincident GA of finance_search makes this a practical moment to evaluate whether Perplexity's structured financial retrieval meets their requirements without an additional data provider contract.
Before committing the Agent API to a production chain, run through the following:
- Pin to 1.3.1, not 1.3.0 — it includes the
perplexityaiSDK 0.34.1 patch shipped the same day. - Check for sampling params — if your chain passes
temperature,top_p,top_k,stop, ormetadata, these will be silently dropped when routed to the Agent API. Remove them or gate behinduse_responses_api=False. - Verify
tool_choiceusage — passingtool_choiceraises aValueErroron the Agent API. Remove it before opting in. - Check your Perplexity tier — structured outputs via
.with_structured_output()require tier 3+. Confirm your tier before building against it. - Update
langchain-coreandlangsmith— 1.3.0 sets floors atlangchain-core≥ 1.3.3 andlangsmith≥ 0.8.5. - Treat cross-provider model strings as version-pinned — strings like
"anthropic/claude-sonnet-4-6"are Perplexity-specific routing syntax. Monitor Perplexity's changelog for updates as model versions evolve.
For Agent API pricing relative to Chat Completions, and for the precise definition of the Perplexity tier 3+ requirement, check Perplexity's pricing documentation directly — neither is specified in the LangChain release notes. The ChatPerplexity class reference and Perplexity's LangChain integration guide are the authoritative sources for current parameter details and supported model lists.
Last updated: 2026-05-30. Based on langchain-perplexity 1.3.1 PyPI release notes, the LangChain GitHub release log, and Perplexity Agent API documentation as of May 2026.



