← Back
Agentforce2026-03-05

Spring '26 Agentforce Production Guide: From the New Builder to Agent Observability

What Actually Changed in Spring '26 for Agentforce

From Winter '26 to Spring '26, Agentforce improvements fall into three categories: a rebuilt authoring experience (new Agentforce Builder + Agent Script), upgraded testing tools (Testing Center for batch testing + Agentforce Grid for interactive validation), and production observability at last (Agent Analytics / Optimization / Health Monitoring). None of these individually qualify as groundbreaking, but together they mean Agentforce finally has a complete toolchain from development through operations.

This article isn't a Release Notes recap. It's organized around one question: what does it take to move an Agentforce agent from Sandbox to Production, and what new capabilities does Spring '26 provide to get there?

The New Agentforce Builder: A Turning Point in the Authoring Experience

The new Agentforce Builder (Beta) in Spring '26 is a ground-up rewrite of the old builder, not a cosmetic refresh. The core change is a code-editor-inspired layout: an Explorer panel on the left managing all agent assets (Topics, Actions, Variables, Connections), an editing area in the center, and an Agentforce Assistant on the right—an AI that helps you build the agent itself.

The Explorer panel on the left organizes the agent's structure: Agent Definition, Settings (Agent Details / System / Language), Topics (Topic Selector / Escalation / Off Topic, etc.), Connections, and Variables. The center canvas supports conversational assembly, while the Agentforce Assistant on the right accepts natural language commands to create and modify agent components.

New Agentforce Builder interface: Explorer panel on the left, canvas in the center, Agentforce Assistant chat panel on the right

The editing area offers two switchable views:

  • Canvas View: Abstracts Agent Script logic into visual blocks for quick assembly and review. Type / to add conditional expressions, @ to reference Topics, Actions, or Variables.
  • Script View: Write Agent Script code directly with syntax highlighting, autocompletion, and real-time validation. Changes sync between views instantly.

The built-in Preview panel is where this gets practical. Enter a user utterance, and the right panel renders a complete reasoning trace: Input → Reasoning → Topic Selector → Transition to Topic → Output Evaluation, plus an AI-generated Summary explaining why the agent made that decision.

Agentforce Builder Preview panel: conversation window on the left, Interaction Details with reasoning trace and Summary on the right

This Preview isn't a substitute for production testing, but during development it quickly surfaces Topic routing errors or Action parameter mapping issues—saving repeated Sandbox deployments.

Agent Script: Controlling Agent Reasoning with Code

Agent Script is a Beta feature introduced in Spring '26 and serves as the underlying language of the new Builder. Its purpose is explicit: let natural language handle conversations, let code handle business logic. The traditional approach of embedding all rules in a single prompt breaks down once conditional branches exceed three layers—LLM execution becomes unreliable. Agent Script extracts conditional logic and variable operations out of the prompt into deterministic expressions.

Agent Script mixes natural language instructions with deterministic code. If @variables.is_late == None triggers the @actions.check_if_late action to query order status. The result gets written to a variable, and if @variables.is_late == "true" routes into an apology response—conditional branching is entirely code-controlled, while only the final reply is generated by the LLM.

Agent Script code snippet: conditional check for late delivery, calling check_if_late action and branching based on results

Key design decisions in Agent Script:

  • Expression-driven branching: if/else conditions, variable assignment, and loops are executed by the expression engine, never by the LLM. Results are deterministic.
  • Variables as state containers: @variables pass context between Topics. Action outputs are written back to variables via @outputs.
  • Pipe prefix for natural language: Content after | is processed by the LLM; everything else is handled by the expression engine. The boundary is unambiguous.
  • Deep integration with Agentforce DX: Agent Script files can be edited in VS Code (Agentforce DX Extension v1.6.0+) with syntax highlighting, validation, and local Preview (both Simulated and Live modes), then deployed to the org via Salesforce CLI.

For existing agent projects, there's no need to migrate everything to Agent Script at once. The recommended approach: rewrite your most error-prone conditional branches in Script first—returns and refunds logic, permission checks, multi-step workflow state machines. Keep conversational Topics on natural language Instructions.

Topic / Instruction / Action Design: Stop Leading with Prompts

Better tooling doesn't fix bad design. Agentforce's three-layer structure appears repeatedly in the documentation, but in practice the most common anti-pattern is building it backwards—writing a massive prompt that tries to cover everything, bolting on a few Actions, and only then thinking about Topic boundaries. This produces blurry intent boundaries, low match rates, and fragile behavior where fixing one thing breaks three others.

The correct design sequence:

  • Topics first: One Topic covers one clear business intent. Mixing "presales inquiry + support ticket + billing explanation" into a single Topic is a recipe for chaos. Topics are audit units and permission boundaries—which user groups can trigger which Topic gets decided here.
  • Instructions set the rules: Instructions under each Topic answer "what constraints apply when handling this type of request." Tone requirements, sensitive information filters, and output format restrictions belong here, not scattered across a System Prompt.
  • Actions execute: Actions expose capabilities via Apex Class, Flow, or Prompt Template. Action descriptions should use business semantics, not technical labels—"Get latest fulfillment status with human-readable summary" is far more effective for LLM intent matching than "invokeOrderApi."

The new Builder in Spring '26 offers prebuilt Topics and Actions to accelerate prototyping. The recommended strategy is "borrow first, narrow fast":

PhaseApproachOutput
BootstrapUse prebuilt Topics / Actions to prove critical journeysMinimum Viable Agent (MVP)
NarrowingRemove low-hit or high-risk intents, add org-specific rulesLean Topic inventory + Instruction baseline
OperationsIterate from failed transcripts and action errorsVersioned improvement log

Engineering the Action Layer: The Real Stability Divide

Many teams blame agent instability on "that's just how AI works," but actual debugging reveals that much of the unreliability originates from incomplete contracts in the Action layer. Treat each Action like a public API, and many problems disappear:

  • Strong input validation: Required fields, enum values, format constraints. LLM-extracted parameters aren't always correct—Actions need defensive logic internally.
  • Distinguishable error codes: Separate "permission denied" from "record not found" from "system error." A generic "operation failed" message makes the agent's response completely uninformative.
  • Traceable side effects: Log who triggered which Action under what context to audit trails. Spring '26 supports persisting Flow execution logs to Data 360 (formerly Data Cloud) via the ssot__FlowRun__dlm DMO, linkable to specific agent sessions.
  • Idempotent by default: Agents may retry Actions due to timeouts. Non-idempotent Actions will produce duplicate business outcomes.

For Apex-based Actions, Spring '26 delivers a practical improvement: Apex Cursors GA. For Actions that process large datasets (such as bulk order queries), Cursors enable server-side pagination beyond the traditional 2,000 record limit—up to 100 million rows within a 24-hour period.

Testing Center: From "A Few Happy Paths" to Systematic Regression

Manually chatting a few rounds before deployment is dangerously inadequate for agents—one untested edge-case utterance can cause an agent to take bizarre actions in production. Agentforce Testing Center provides batch testing capabilities, transforming testing from ad-hoc conversations into repeatable, trackable engineering practice.

The Testing Center results overview displays five core metrics: Status, Duration, Topic Pass %, Action Pass %, and Response Pass %. These three percentages are your go/no-go gate for production deployment.

Testing Center results overview: Status Complete, Topic Pass 100%, Action Pass 60%, Response Pass 25%

Drilling into individual test cases, each record shows: Utterance (test input), Expected Topic vs. Actual Topic, Expected Actions vs. Actual Actions, and Topic Test Result (Pass/Fail).

Testing Center per-case results: Utterance, Expected/Actual Topic, Topic Test Result, Expected Actions

In practice, structure your tests in three layers:

  1. Intent robustness tests: Synonyms, colloquial phrasing, mixed languages, typos. The goal is ensuring Topic routing remains stable.
  2. Action correctness tests: Are parameters extracted and mapped correctly? Are error inputs handled gracefully? Are results properly assembled into responses?
  3. Security boundary tests: Cross-user data access attempts, sensitive field exposure, prompt injection attacks.

One current limitation: Testing Center only supports single-turn interactions. Multi-turn conversation context can't be validated in batch tests yet. For multi-turn scenarios, you'll still need Preview or Agentforce Grid for manual testing.

Agentforce Grid is another Spring '26 testing tool worth tracking (Beta). It provides a spreadsheet-like interface that chains CRM data, Prompts, Actions, and Agents for bulk execution, with support for A/B testing different Prompt variants and comparing results. For scenarios requiring large-scale agent validation with real data, Grid is more flexible than Testing Center.

Agent Observability: What to Watch After Go-Live

Deployment is just the beginning. Without observability, agent systems typically degrade into black boxes within three months—too risky to improve, too costly to retire. Salesforce integrated three layers of observability tools into Agentforce Studio:

Agent Analytics

A Tableau-powered KPI dashboard under the Observe navigation in Agentforce Studio. The Agent Analytics panel displays five key metric cards under Performance Overview: Engagement Rate (31.36%), Escalation Rate (19.86%), Deflection Rate (34.49%), Abandonment Rate (45.64%), and Avg. Time to Deflection, each with trend line charts. Filters above support slicing by Agent, Timeframe, Channel, Topic, and Action.

Agent Analytics dashboard: Engagement Rate, Escalation Rate, Deflection Rate, Abandonment Rate metric cards with trend charts

Beyond Overview, the Agent Effectiveness and Quality sub-tabs focus on task completion efficiency and response quality respectively. The value of this data isn't in individual numbers—it's in trend changes. A three-day decline in Deflection Rate might indicate a recent Instruction update introduced side effects.

Agent Optimization

If Analytics answers "how is it performing," Optimization answers "why is it underperforming." It records the full reasoning chain for every interaction: user inputs, LLM calls, tool invocations, guardrail evaluations, and response latency. More critically, it automatically clusters real production user intents—you designed 10 Topics, but users may be asking an 11th type of question you never anticipated. The system applies "LLM-as-judge" quality scoring (high / medium / low / very low) with explanations to both individual sessions and intent clusters.

Operations teams can navigate from a KPI anomaly in Analytics, drill into low-quality intent clusters in Optimization, pinpoint a specific Instruction or Action issue, and ship a fix the same day. This is what Salesforce calls the "Agentic SDLC"—production data driving development iterations.

Agent Health Monitoring (Spring '26 GA)

This is the final piece Spring '26 fills in: near-real-time health monitoring. The Agent Health Monitoring panel under Observe > Monitoring in Agentforce Studio shows an Alerts list at the top (e.g., "Error Rate Threshold 2," "Escalation Rate Service") and three real-time metric charts at the bottom: Agent Error Rate (13%), Avg. Latency (34ms), and Escalation Rate (14%), filterable by Agent ID, Agent Type, and Channel.

Agent Health Monitoring: Alerts list, Agent Error Rate / Avg. Latency / Escalation Rate real-time charts

Alerts can be configured with thresholds and linked to Scale Center for deeper investigation. For example, setting "Error Rate exceeds 10% for 5 minutes" as a trigger lets operations teams respond immediately.

MCP Governance and Security Hardening

Spring '26 formally adopted Anthropic's Model Context Protocol (MCP) standard for governing how agents access external tools and data sources. MCP introduces governance controls at two levels:

  • Integration layer: API Catalog serves as the control plane for MCP Servers. Through MuleSoft sync, administrators can explicitly activate or deactivate specific MCP Servers, governing which enterprise data sources agents can reach.
  • Build layer: The new Salesforce DX MCP Server enables developers to interact with org metadata via natural language in their IDE. An important security decision here: Agentforce Vibes (the web-based IDE) is now restricted to sandboxes only—production access has been disabled.

On the security front, Spring '26 introduces several notable changes:

  • Connected App creation blocked by default: Neither UI nor Metadata API allows creating new Connected Apps (except via package installation). Re-enabling requires Salesforce Support approval. The replacement is External Client Apps (ECAs), which enforce modern OAuth flows and are secure by default.
  • Legacy authentication phase-out: Platform SOAP API login() is disabled by default in new orgs, with complete removal planned for Summer '27.
  • Shorter certificate lifecycles: New CA-signed certificates are limited to 6.5 months (200 days), decreasing to 100 days in 2027.

The Einstein Trust Layer underpins all of this. It acts as a security intermediary between agents and LLMs, providing configurable Data Masking, Zero Data Retention, input/output Toxicity Detection, and Secure Data Retriever. Supervisory LLMs function as runtime monitors, scanning prompts and responses for policy violations before they reach users.

A Reusable Production Readiness Checklist

Compressing everything above into a checklist you can apply directly:

PhaseChecklist ItemSpring '26 Tool Support
DesignEach Topic covers one business intent with a clear ownerNew Builder Explorer panel
Each Action has I/O contracts, error semantics, and audit fieldsFlow Persistent Logging → Data 360
High-frequency conditional branches use Agent Script, not pure promptsAgent Script (Beta) + VS Code Extension
TestingBatch tests covering at least 20 real user utterancesTesting Center batch testing
Security boundary tests: cross-user access, sensitive fields, prompt injectionTesting Center + Preview Interaction Details
A/B comparison of multiple Prompt variantsAgentforce Grid (Beta)
OperationsContinuous monitoring of Deflection Rate / Escalation Rate / Error RateAgent Analytics + Health Monitoring
Low-quality intent clusters → Instruction fixes → regression testing weeklyAgent Optimization → Testing Center
Version every release: what changed, why, and impact scopeAgentforce DX + Commit Version

Default high-risk capabilities (data modification, refund initiation, permission changes) to human fallback. Only open the gate to full automation once Deflection Rate and Action Success Rate stabilize within acceptable ranges.


References

Related Articles

Discussion

Ask a Question

Your email will not be published.

No questions yet. Be the first to ask!