Guides

Test in the sandbox

Fibric separates proposing from acting, which makes testing unusually direct: replay a recorded event, read the exact ExecutionPlan the operator proposes, and assert on it, with no side effect reaching a real system. This guide covers the two sandbox layers, setting up the sandbox connector, the fixture format, recording and replaying fixtures, asserting on proposed plans, and promoting a tested operator and its connection to production.

Two layers of sandbox

Two distinct things are commonly called "the sandbox", and they compose rather than compete:

Layer	What it is	When to use it
The local kernel (`fibric dev`)	An in-process event bus, an in-memory executor, and a file-backed secret stub, running entirely on your machine as tenant `t_local`. It watches your source, reloads on change, and prints every envelope, plan, and disposition. Side-effecting handlers are stubbed by default, so dev receipts are structurally identical to production receipts.	Inner-loop development and automated tests. Nothing leaves your machine unless you explicitly create a sandbox connection.
Sandbox connections	A workspace connection whose credentials point at a vendor's test environment, or at a simulator connector such as `sandbox-orders`. Sandbox and live credentials are separate secrets in separate connections, so a test can never silently pick up the live key. See Tools & auth.	Integration testing in a real workspace: exercising the full deployed path, including router, policy, and receipts, against a system that is safe to touch.

A typical progression uses both: develop and assert against the local kernel, then deploy to a workspace bound to sandbox connections, then promote. Each step keeps the operator and policy unchanged; only the environment underneath them moves.

Set up the sandbox connector

sandbox-orders is the simulated order system the quickstart uses. It speaks the same capability interface a real order system would, emits realistic order events, and accepts holds and notifications without touching anything real. Bind it to the orders role, the same role a production connector will later occupy:

bash

# add the simulator and bind it to the orders role
fibric connectors add sandbox-orders --as orders

# confirm the capabilities it exposes
fibric capabilities ls

$ fibric capabilities ls CAPABILITY CONNECTOR STATUS orders.read sandbox-orders ready orders.hold sandbox-orders ready orders.notify sandbox-orders ready

Because operators ask for capabilities, not vendors, everything you build against sandbox-orders carries over when the binding changes. That indirection is the entire promotion story, covered below.

For a real vendor with a test environment, the equivalent is a second connection on the same connector, following the pattern in Tools & auth:

bash

# one connector, two connections: sandbox and live credentials never mix
fibric connectors add cn-brightdesk --connection brightdesk-sandbox
fibric connectors add cn-brightdesk --connection brightdesk-live

# exercise one tool against the sandbox connection.
# side-effecting tools dry-run by default: validation and trust
# evaluation execute, the handler does not.
fibric connectors test cn-brightdesk conversation.read \
  --connection brightdesk-sandbox \
  --args '{"conversation_id":"cnv_3021"}'

Fixture format

A fixture is one or more recorded event envelopes, stored as JSONL under ./fixtures. The shape is exactly the kernel's EventEnvelope, with no test-only fields, which is why anything that flowed through production can become a fixture:

fixtures/order-created.jsonl

{"event_id":"6a1e0c2f-6c1a-4f0e-9f31-b7d02f6e8c11",
 "reseller_id":null,
 "tenant_id":"t_local",
 "workspace_id":null,
 "source":"sandbox-orders",
 "event_type":"order.created",
 "correlation_id":"9f31c4d8-2b7e-4a55-8d20-1f0a9e3b6c77",
 "payload":{"order_id":"SO-10884","promise_date":"2026-07-04","total":412.50},
 "agent_id":null,
 "session_id":null}

Field	Type	Notes for fixtures
`event_id`	`string`	Any UUID. Replay does not dedupe on it; the executor dedupes on action idempotency keys.
`reseller_id`	`string \| null`	`null` for Fibric-direct workspaces. Present on every envelope, fixtures included.
`tenant_id`	`string`	Use `t_local` in dev. The harness refuses fixtures whose tenant does not match the local tenant, for the same reason production does.
`workspace_id`	`string \| null`	`null` in local fixtures; exports rewrite it alongside `tenant_id`.
`source`	`string`	The emitting connector or system, half of what the router matches operator triggers against.
`event_type`	`string`	The event name, matched against operator triggers with glob semantics, for example `order.*`.
`correlation_id`	`string`	Ties the envelope, the proposed plan, and the receipts together in harness output.
`payload`	`Record<string, unknown>`	The event body the connector emitted, or a hand-written equivalent.
`agent_id`	`string \| null`	Set when an operator emitted the event; `null` for system-sourced fixtures.
`session_id`	`string \| null`	Set for events inside an interactive session; usually `null` in fixtures.

Recording fixtures

The best fixtures are recorded, not written. Any envelope that has flowed through a workspace, sandbox or production, can be exported as replayable JSONL, with payloads intact and tenant identifiers rewritten to t_local so the harness accepts them:

bash

# export the last week of triggering envelopes as fixtures
fibric receipts export --envelopes --since 7d > fixtures/last-week.jsonl

$ fibric receipts export --envelopes --since 7d > fixtures/last-week.jsonl exported 142 envelopes · tenant ids rewritten to t_local · payloads intact

Recorded fixtures earn their keep twice: they carry the odd shapes real systems produce that hand-written fixtures never do, and they make regressions replayable. When an operator misbehaves in production, export the envelope that triggered it and the failure becomes a test. The webhook ingestion guide shows the same export used to replay a delivery storm.

Replaying fixtures

Replay pushes fixture envelopes through the local router, which matches them against operator triggers exactly as production would, glob semantics included:

bash

# replay one fixture file through whatever operators match
fibric dev replay fixtures/order-created.jsonl

# replay a whole directory, propose-only: nothing disposed
fibric dev replay fixtures/ --propose-only

$ fibric dev replay fixtures/order-created.jsonl 1 envelope · matched: ship-risk (trigger order.*) plan proposed: 2 actions · written to .fibric/dev/plans/9f31.json dispose order.hold ALLOW key=ship-risk:SO-10884:hold handler stubbed dispose orders.notify ALLOW key=ship-risk:SO-10884:notify handler stubbed receipt 2 written → .fibric/dev/receipts.jsonl

Two modes matter. The default runs the full pipeline, validation, policy, single-flight, dedup, with side-effecting handlers stubbed, and writes receipts you can assert on. With --propose-only the run stops at the plan: nothing is evaluated against policy, nothing acquires a lock, and the plan lands on disk as JSON for your test to read. Propose-only is the mode most unit tests want.

Dev receipts are real receipts

Because the stub replaces only the handler call, a dev-mode receipt records the same proposal, evaluation, disposition, and keys a production receipt would. Assertions you write against .fibric/dev/receipts.jsonl hold in production unchanged. See Receipts & audit.

Asserting on proposed plans

A proposed plan is plain data in the kernel's ExecutionPlan shape: an optional reasoning string and an actions array of PlannedAction entries, each carrying connector, tool, args, an optional value, an entity_key, and an idempotency_key. Because it is data, you assert on it with your ordinary test runner. The SDK exposes the harness programmatically for exactly this:

tests/ship-risk.test.ts

import { test, expect } from 'vitest';
import { devKernel } from '@fibric/connector-sdk/testing';
import shipRisk from '../operators/ship-risk';
import fixture from '../fixtures/order-created.json';

test('holds an at-risk order, exactly once, with stable keys', async () => {
  const kernel = devKernel({ operators: [shipRisk] });

  // propose only: the plan is returned, nothing is disposed
  const plan = await kernel.propose(fixture);

  expect(plan.actions).toHaveLength(2);

  const [hold, notify] = plan.actions;
  expect(hold.tool).toBe('orders.hold');
  expect(hold.args.order_id).toBe('SO-10884');

  // idempotency and single-flight keys are part of the contract:
  // pin them, so a refactor cannot silently change dedup behavior
  expect(hold.entity_key).toBe('order:SO-10884');
  expect(hold.idempotency_key).toBe('ship-risk:SO-10884:hold');
  expect(notify.idempotency_key).toBe('ship-risk:SO-10884:notify');
});

Disposition is testable the same way, without mocks, because the kernel's evaluate() is a pure function from policies, action, and envelope to a decision. Run the same fixture twice through the harness and the second run's disposition comes back DEDUP with the handler untouched, which is the exact production behavior your retries rely on. Testing connectors covers trust-tier simulation and a full CI recipe built on these pieces.

Treat keys as API

The entity_key and idempotency_key an operator emits are behavioral contracts: change them and yesterday's dedup no longer applies to today's retries. Pin them in tests the way you would pin a wire format. See Single-flight & idempotency.

Promoting to production

Promotion is deliberately small. The operator does not change, because it names capabilities, not vendors. The policy does not change, because it names capabilities too. The only thing that moves is the connection bound to the role.

Swap the connection binding

Rebind the role from the sandbox connection to the live one. Credentials never mix: the live connection holds its own secret, entered at connection time, and nothing from the sandbox connection carries over.

bash

# the live connector takes over the orders role
fibric connectors add cn-magento --connection magento-live --as orders

# confirm every capability the operator needs is served by the live binding
fibric capabilities ls
fibric policy validate ship-risk-guardrails --against ship-risk

$ fibric capabilities ls CAPABILITY CONNECTOR STATUS orders.read cn-magento ready orders.hold cn-magento ready orders.notify cn-magento ready binding changed: orders → magento-live (was sandbox-orders)

Run the first live pass supervised

Even with a tested operator and an unchanged policy, take the first production contact in two steps. First a dry run, which senses real data and evaluates the real plan but executes nothing; then a single supervised run with receipts tailing in another terminal.

bash

# 1. real data, real evaluation, zero side effects
fibric operators run ship-risk --dry-run

# 2. one real run, watched live
fibric operators run ship-risk --once
fibric receipts tail --operator ship-risk

$ fibric operators run ship-risk --dry-run sensed 217 open orders (live: cn-magento) reasoned proposed 3 holds (dry run: nothing will execute) would-be dispositions: orders.hold SO-30112 ALLOW key=ship-risk:SO-30112:hold orders.hold SO-30140 ALLOW key=ship-risk:SO-30140:hold orders.hold SO-30158 ALLOW key=ship-risk:SO-30158:hold ✓ dry run complete · 0 side effects

If the dry run proposes anything surprising, nothing has happened yet; fix the operator or tighten the policy and repeat. Once the supervised run behaves, hand the operator its schedule or live trigger. Ongoing observation, alerting on BLOCK rates, and receipt-based review are covered in Monitor operators in production; moving a whole workload between systems this way is the subject of Migrate a connector without downtime.

Proven in production

This sandbox-to-live path is how BearScope, Fibric's flagship product, runs in production today: operators developed against fixtures and sandbox connections, promoted by rebinding roles, governed by the same policies in both environments.

Keep going

Write guardrail policies: the policy documents these tests exercise.
Testing connectors: trust-tier simulation and the full CI recipe.
Tools & auth: sandbox vs live connections, and why dev-mode stubs are safe by default.
The event envelope: the full field reference behind the fixture format.
CLI reference: fibric dev, replay, receipts export, and exit codes for CI.
Build an order-risk watcher: a complete operator to take through this workflow.

Test in the sandbox

#Two layers of sandbox

#Set up the sandbox connector

#Fixture format

#Recording fixtures

#Replaying fixtures

#Asserting on proposed plans

#Promoting to production

#Swap the connection binding

#Run the first live pass supervised

#Keep going

Two layers of sandbox

Set up the sandbox connector

Fixture format

Recording fixtures

Replaying fixtures

Asserting on proposed plans

Promoting to production

Swap the connection binding

Run the first live pass supervised

Keep going