Guides

Monitor operators in production

An operator in production is a worker you supervise, not a script you forget. This guide covers the signals that matter once an operator is live: the receipts stream as the primary record of behavior, connector and cadence health, the veto rate as an early drift indicator, alerting on stalls, and exporting receipts into your own analysis stack.

Before you begin

This guide assumes a deployed operator and a bound connector. If you do not have one yet, the quickstart gets you there against a sandbox in about ten minutes, and Your first operator covers the full walkthrough. Everything below works identically against the sandbox and against live systems, because monitoring reads the same ledger in both cases.

The receipt as the unit of monitoring

Everything an operator does, and everything it was refused, lands in one place: the receipt ledger. Each receipt records the proposal exactly as the operator made it, the policy evaluation that disposed of it, the disposition itself, and the idempotency_key the action carried. Because BLOCK and DEDUP produce receipts too, the ledger is a complete account of intent, not only of effect.

This changes what monitoring means. In a conventional automation you watch logs and infer behavior. With an operator you watch receipts and read behavior directly: what was sensed is in the envelope reference, what the model wanted to do is in the action field, and what the deterministic executor decided is in decision and policy. Every question in this guide reduces to a query over receipts, plus one class of health signal (connector probes) that comes from the connectors themselves.

Three properties make receipts a sound monitoring substrate:

Complete. Every disposed action leaves a receipt, including refusals and deduplications. There is no unlogged path from a proposal to a side effect.
Immutable. Receipts are never edited or thinned. A trend you compute over last month's receipts is computed over exactly what happened last month.
Attributed. Every receipt names its operator, its triggering envelope, and its correlation_id. A spike in any metric can be walked back to the individual actions, and from each action back to the observation that caused it.

Tailing and filtering the stream

The day-to-day surface is fibric receipts tail, which follows receipts live, like tail -f on what your operators are doing right now. Filters compose: --operator narrows to one worker, --decision selects a disposition, and --json emits one object per line for piping into your own tooling. The full flag set is in the CLI reference.

bash

# everything one operator is doing, live
fibric receipts tail --operator order-risk

# a live view of what policy is refusing, across all operators
fibric receipts tail --decision BLOCK

# machine-readable, for piping into jq or your own collector
fibric receipts tail --operator order-risk --json

$ fibric receipts tail --operator order-risk receipt rc_7a02 order SO-11290 orders.hold applied policy=allow idem=ok receipt rc_7a03 order SO-11290 notify.send applied policy=allow idem=ok receipt rc_7a09 order SO-11290 orders.hold deduplicated idem=order-risk:SO-11290:hold receipt rc_7a14 order SO-11307 orders.refund refused policy=maxValue 500 exceeded (820)

A single record is retrieved in full with fibric receipts show rc_7a14 --json, which includes the envelope reference, the verbatim PlannedAction, the rule that matched, and the three timestamps of the action's life. The same data is available over REST at the Receipts API with the same filters as query parameters.

Reading dispositions as signals

Each receipt carries one of four dispositions. Individually they describe one action; in aggregate, a shift in the mix is the earliest signal that something upstream, in the model, or in your policy has changed.

Disposition	What one receipt means	What a shift in the rate means
`applied`	The proposal matched policy with decision `ALLOW`, passed every constraint, and the connector call completed.	A rising count with steady inputs means the operator is finding more to act on: verify the upstream volume before assuming the operator changed. A falling count with steady inputs points at a stall or at proposals shifting into another disposition.
`alerted`	The proposal matched a policy carrying the `ALERT` decision and was raised to the human approval queue. See Trust tiers.	A growing queue means humans are the bottleneck: either staff the queue or, if the alert history shows consistent approvals, promote the action toward `ALLOW` within limits, per the guardrails guide.
`blocked`	Policy refused the action: no rule matched (fail-closed default), a `maxValue` was exceeded, or a `predicate` returned false. Nothing reached the connector.	The most important trend on this page. A rising block rate means the model is proposing things it did not used to propose (model drift), upstream data changed shape or scale, or a policy edit tightened the gate. See the veto rate below.
`dedup`	The executor had already applied an action with this `idempotency_key`; the repeat was acknowledged, skipped, and receipted. See Single-flight & idempotency.	Occasional dedups are the system working: retries and overlapping runs collapsing to one side effect. A sustained rise means something is re-proposing the same work, a duplicate trigger, a retry storm at the source, or an operator scheduled more often than its data changes.

Refusals are signal, not noise

Most systems log what they did. Fibric also logs what it declined to do. A steady trickle of blocked receipts from the fail-closed default is normal early on, and it is the evidence base for deciding which actions deserve an explicit policy. A trickle that becomes a stream is a drift alarm. This is the veto trail described on the receipts page.

Health signals

Receipts tell you what operators are doing. Three health signals tell you whether the machinery around them is in a state where receipts can be trusted to keep flowing: connector probe status, last-event age, and operator run cadence.

Connector probe status

Every connector may declare a probe in its definition: a health check the platform calls on a schedule, returning a short status string and, optionally, one labelled metric. fibric connectors list surfaces the latest probe result per connection, and the connection itself carries a lifecycle status of connected, pending_auth, or error, as documented in the Connectors API.

$ fibric connectors list ID VERSION CATEGORY CONNECTIONS PROBE cn-kustomer 2.1.0 comms 1 ok (open conversations: 214) cn-magento 1.8.2 commerce 1 ok (open orders: 1,182) cn-amazon-connect 1.3.0 voice 1 error (credential expired)

Signal	Meaning	Operational response
Probe `ok`	The connector reached its source with valid credentials on the last scheduled probe. The optional metric (for example `open orders: 1,182`) is a sanity check on the data behind it.	None. Watch the metric for implausible values: an order connector suddenly reporting zero open orders is often an upstream problem the probe itself cannot see.
Connection `error`	The last probe or connection test failed. The connector keeps its configuration; operators depending on its capabilities will have proposals fail at the connector call.	Fix credentials or reachability, then re-run `fibric connectors test`. A failing test reports which tool failed and why.
Connection `pending_auth`	The connection was created but its auth flow has not completed. No events flow and no tools execute.	Complete the auth flow. A connection stuck in `pending_auth` after setup usually means an OAuth grant was abandoned partway.

Last-event age

A connector can probe healthy and still be silent: credentials valid, endpoint reachable, and no events arriving because a webhook subscription lapsed or an upstream export stopped. The signal for this is the age of the newest envelope per source. Because every event travels as an EventEnvelope with a source field, the question is answerable from the export surface directly:

bash

# newest envelope per source over the last day
fibric receipts export --envelopes --since 1d \
  | jq -r '.source' | sort | uniq -c

Decide a freshness budget per source and treat silence beyond it as an incident. A help desk that normally produces an event every few minutes and has produced none for an hour is down for your purposes, whatever its own status page says. Stall alerting on this signal is covered below.

Operator run cadence

An operator that runs on a schedule or trigger should leave a rhythm in the ledger: runs at the expected interval, each producing zero or more receipts under one correlation_id. Two deviations matter:

Missing runs. The operator stopped being invoked: a paused operator (fibric operators list shows state), a broken trigger, or a source that stopped emitting the events it triggers on.
Empty runs. The operator runs but proposes nothing, run after run. Sometimes that is correct (nothing to do); sometimes the sensed data changed shape and the operator's filter now matches nothing. Compare against the upstream volume before deciding which.

To inspect a suspect operator directly, run it once by hand and watch the cycle, or dry-run it to see what it would propose without executing anything:

bash

# one full sense → reason → dispose → act cycle, then exit
fibric operators run order-risk --once

# show the exact ExecutionPlan it would submit; execute nothing
fibric operators run order-risk --dry-run

The veto rate

The veto rate is the share of proposals disposed as BLOCK or ALERT over a window:

veto rate = (blocked + alerted) / total proposals

It measures the distance between what the model wants to do and what your policy permits. A stable veto rate means the operator and the policy agree about the shape of the work. A rising rate means one of three things has moved, and the receipts tell you which:

Model drift. The model is proposing actions, arguments, or values it did not previously propose. The action field on blocked receipts shows what changed: a new tool appearing under the fail-closed default, or familiar tools with values that now trip maxValue.
Upstream data change. The source changed shape or scale, an order field switched currency units, a status enum gained a value, and proposals derived from it now fail constraints. The envelope reference on each receipt walks you back to the triggering data.
Policy drift. Someone tightened a rule. Correlate the inflection point with your policy history; a step change on a deploy date is policy, a slow ramp is not.

Compute it from an export. The window should be long enough to smooth run-to-run noise; seven days is a reasonable default:

bash

fibric receipts export --since 7d --format jsonl > receipts-7d.jsonl

jq -s 'group_by(.verdict.decision)
       | map({decision: .[0].verdict.decision, n: length})' receipts-7d.jsonl

$ jq -s '…' receipts-7d.jsonl [ { "decision": "ALLOW", "n": 1061 }, { "decision": "ALERT", "n": 98 }, { "decision": "BLOCK", "n": 74 }, { "decision": "DEDUP", "n": 7 } ] veto rate = (74 + 98) / 1240 ≈ 13.9%

Worked example: this operator's veto rate had held near 4% for a month, then climbed to 13.9% over a week with no policy change. Filtering the blocked receipts showed the pattern in one command:

$ fibric receipts tail --operator order-risk --decision BLOCK receipt rc_8c11 order SO-11402 orders.hold refused policy=maxValue 500 exceeded (51200) receipt rc_8c12 order SO-11407 orders.hold refused policy=maxValue 500 exceeded (48900) receipt rc_8c15 order SO-11413 orders.hold refused policy=maxValue 500 exceeded (63075)

Order values a hundred times their usual range: the upstream system had started reporting amounts in cents rather than dollars. The policy was right to refuse, the model was right to propose, and the veto rate surfaced an upstream regression before it produced a single wrong action. That is the designed behavior: fail-closed policy converts drift into visible refusals instead of silent damage.

Do not fix a rising veto rate by loosening policy

The rate is a symptom. Widening maxValue or adding an allow rule to make the number go down treats the alarm, not the cause. Diagnose from the blocked receipts first; only change policy once you can say which of the three drift sources moved and why the new behavior is correct. Governance & trust covers the reasoning behind default-closed evaluation.

Alerting on stalls

Drift is an operator doing the wrong thing. A stall is an operator doing nothing, which is quieter and often costlier. Two standing conditions cover the failure class:

No proposals in N hours. An operator that normally proposes several times a day has left no receipts in its usual window. Either its trigger stopped firing, it was paused, or its sensed data went silent.
No events from a source in N minutes. The envelope stream from one source has gone quiet beyond its freshness budget. Every operator triggered by that source is now stalled together, whatever their own status shows.

Set N per operator and per source from observed behavior, not from hope: an operator that runs nightly gets a 26-hour budget, a webhook-fed source that averages an event a minute gets a 15-minute one.

Alert rules are configured through the platform API and route through any comms-capable connector, the same capability indirection operators use, so a stall alert can land in the channel your team already watches. Alert-rule endpoints are part of the v0.9 preview API surface; the shapes below are stable in intent but may change field names before the v1 freeze, in the same way the API overview describes for webhook management.

bash

# preview API: alert when an operator leaves no receipts for 6 hours
curl -X POST https://api.fibric.io/v1/alert-rules \
  -H "Authorization: Bearer $FIBRIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "order-risk stalled",
    "condition": { "type": "no_receipts", "operator": "order-risk", "window": "6h" },
    "notify": { "capability": "notify.send", "target": "#ops-fibric" }
  }'

# preview API: alert when a source stops producing envelopes
curl -X POST https://api.fibric.io/v1/alert-rules \
  -H "Authorization: Bearer $FIBRIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "kustomer feed silent",
    "condition": { "type": "no_events", "source": "kustomer", "window": "15m" },
    "notify": { "capability": "notify.send", "target": "#ops-fibric" }
  }'

The notification is itself a governed action

An alert delivered through notify.send goes through the same executor as any other action: policy-checked, idempotency-keyed, receipted. A flapping alert condition cannot flood a channel, because repeats within the same incident collapse to DEDUP, the same property that exists because of the 657-message flood that shaped the kernel. See Single-flight & idempotency.

Exporting receipts for external analysis

The tail is for watching; trends live in your warehouse. fibric receipts export emits the ledger as JSONL (default) or CSV, bounded by --since and --until, with every field described on the receipts page. JSONL preserves the nested action and envelope objects, which is what you want for loading into a warehouse column as structured data; CSV flattens the verdict into separate columns.

bash

# a month of receipts, one JSON object per line
fibric receipts export --since 2026-06-01 --until 2026-07-01 --format jsonl > receipts-june.jsonl

# the triggering envelopes instead, replayable as test fixtures
fibric receipts export --envelopes --since 7d > fixtures/last-week.jsonl

For asynchronous or larger exports, POST /v1/receipts/export on the Receipts API creates an export job you poll for a download URL. Scheduled recurring drops to your own S3 bucket are an Enterprise feature.

For push rather than pull, webhook endpoints receive action dispositions as they happen, each delivery signed with an HMAC in the Fibric-Signature header so your receiver can verify origin before trusting the payload. Delivery payloads are the same receipt objects documented on the API pages. Setup, signature verification, and retry semantics are covered in the webhooks guide.

Useful standing metrics once receipts are in your warehouse, whether that is a BI stack or a product dashboard (BearScope, Fibric's flagship product, runs in production today on exactly this ledger):

Veto rate per operator per day, with an alert threshold at two to three times its trailing baseline.
Dedup count per idempotency-key prefix, which localizes duplicate-work sources to one operator and action type.
Latency percentiles from proposed_at to completed_at, split by connector, which separates slow models from slow systems.
ALERT queue depth and time-to-decision, the evidence base for promoting actions to unattended ALLOW.

Keep going

Receipts & audit: every field on the record this guide is built on.
Receipts API: the REST surface behind the CLI, with filters and export jobs.
Trust tiers: how ALLOW, ALERT, and BLOCK are decided, and what escalation means.
Design guardrails that hold: turning veto-trail evidence into better policy.
Receive webhooks: dispositions pushed to your infrastructure, signed.
Migrate from scripts and point automations: the cutover ladder that ends with an operator you monitor this way.

Monitor operators in production

#The receipt as the unit of monitoring

#Tailing and filtering the stream

#Reading dispositions as signals

#Health signals

#Connector probe status

#Last-event age

#Operator run cadence

#The veto rate

#Alerting on stalls

#Exporting receipts for external analysis

#Keep going

The receipt as the unit of monitoring

Tailing and filtering the stream

Reading dispositions as signals

Health signals

Connector probe status

Last-event age

Operator run cadence

The veto rate

Alerting on stalls

Exporting receipts for external analysis

Keep going