Fibric. Docs fibric.io →
v0.9 ยท preview
Platform

Troubleshooting

This page is organized by symptom: what you observe, what is usually causing it, and how to resolve it, with the relevant error codes linked into the error reference. A useful prior for all of it: Fibric fails closed and receipts everything, so most "failures" are the platform refusing to do something unsafe, and the receipt or error body tells you which rule refused it.

First moves for any incident

bash
# who am I acting as, and is the key valid?
fibric auth whoami

# are the connectors healthy? probes catch credential problems first
fibric connectors list

# what actually happened? the receipt ledger is the ground truth
fibric receipts tail

Every API error body carries a stable error.code and a request id; include the request id when contacting support. Retryability by status code is tabulated in Errors.

Connector auth expired or rejected

Symptom: a connector's status shows error in fibric connectors list or the console; tool calls against it fail; receipts show actions failing with upstream authentication errors; a connector test returns auth_failed.

CauseResolution
The credential expired or was revoked at the source system, a rotated API key, a revoked OAuth grant, a lapsed certificate. Issue a new credential at the source and run the zero-downtime rotation sequence in Secrets and credentials: update the installation, fibric connectors test <role>, then revoke the old credential.
The credential is live but under-scoped: reads work, a specific tool fails. Compare the scopes or permissions granted against what the connector's tools declare (see Tools & auth) and re-grant with the missing scope.
For aws_iam connectors, the role's trust policy or permissions changed. Restore the assume-role trust relationship and the actions the connector's tools require, then re-run the test.

Error codes: auth_failed on install or test; connector_upstream_error when the source rejects calls mid-operation. Note the distinction from key_invalid/key_revoked, which are about your API key, not the connector's credential; see Errors.

Events stalled

Symptom: operators stop proposing, dashboards flatline, or a stream goes silent while everything reports healthy.

Work from the source inward; the failure is almost always at the first hop that has no new data.

CauseHow to confirmResolution
The source stopped sending: a webhook registration was removed or the source system is down. GET /v1/events?source=<source> shows no recent events at all. Re-register the webhook at the source (re-running the connector install does this), or wait out the source outage; deliveries retried by the source dedupe on ingest, so nothing double-counts.
A polling connector's credential broke, so polls fail silently from the source's perspective. fibric connectors list shows the probe failing. Treat as connector auth above.
Events arrive but your operator's trigger does not match them. Events visible in GET /v1/events, no plans in receipts. Check the trigger glob: * matches a single dot-delimited segment, so order.* does not match order.refund.issued. Widen the trigger or add one per segment depth, then replay the missed window.
The operator is paused. fibric operators list shows its status. fibric operators resume <id>. Events that arrived while paused are in the log and can be replayed.
Only your stream is silent; ingest is fine. The list API shows new events your stream never delivered. Your consumer likely stopped reading and was disconnected, or holds a cursor for mismatched filters (invalid_cursor). Reconnect with the last checkpointed resume token; see Streaming events.
Ingest is being rate limited. Your producer receives 429. Back off per Retry-After and retry with the same idempotency keys; nothing is lost. See quotas below.

Error codes: rate_limited, invalid_cursor, connector_upstream_error.

Action vetoed or blocked

Symptom: an operator proposed something and it did not happen. The receipt shows a disposition of BLOCK, or a plan sat in approval and a human vetoed it.

First, read the receipt: it records which policy refused the action, or who vetoed and why. Then distinguish the three cases, because their remedies differ:

CaseWhat the receipt showsResolution
No policy matched the action BLOCK with no matching policy, the fail-closed default. If the action should be possible, add a policy for that connector and tool to the operator's guardrails via the Operators API. Start it at ALERT; promote to ALLOW with receipt evidence. See Trust tiers.
A constraint failed BLOCK from a matching policy: value above maxValue, or a predicate returned false. Working as designed. Raise the ceiling only deliberately; a human approving an over-ceiling action is not possible by design, the policy itself must change.
A human vetoed An ALERT disposition followed by a veto, with identity and reason. Address the reason. If the same class of action is repeatedly approved instead, that history is the evidence for promoting the tool to ALLOW.

An API request whose actions were all refused fails with policy_blocked, and the body carries the per-action verdicts. If the operator's capability is not fulfilled by any installed connector at all, you get capability_unbound instead: install and bind a connector for that capability (Capabilities).

i
A block is the system working

A BLOCK disposition is not an error state to be paged on; it is the guardrail doing its job and leaving evidence. Alert on unexpected blocks, a previously ALLOWed tool suddenly blocking suggests a guardrail edit, not on blocks per se.

Duplicate suppressed

Symptom: a receipt shows DEDUP, or an API write returns 409 idempotency_conflict, or "the same" event you sent twice appears once.

ObservationMeaningAction needed
Receipt with disposition DEDUP The action's idempotency_key was already consumed; the side effect did not run twice. ok: true, the intended state holds. None. This is the retry-safety contract working; the receipt exists precisely so you can see the replay happened.
Ingest returns the same event for repeated POSTs At-least-once delivery collapsing on your Idempotency-Key. None. This is correct producer behavior.
409 idempotency_conflict An idempotency key was replayed with a different request body. Two different requests claimed to be the same one; neither the stored result nor the new body was acted on. Fix the client: use a new key for a genuinely new request, or replay the original body unchanged. See Errors.
Distinct events collapsing that should not Your key derivation is too coarse, for example keying on the order id alone rather than order id plus version. Include the changing component in the key: magento:SO-10884:v7, not magento:SO-10884. Keys are retained for 24 hours, per tenant and route.

Key-design guidance is in Single-flight & idempotency and Reliability.

Requests failing with entity_locked

Symptom: writes intermittently return 409 entity_locked with a Retry-After header, usually under bursts touching the same order, conversation, or asset.

Cause: the single-flight lock for an entity_key your request needs is held by other in-flight work. This is transient by design: the lock releases when that work disposes.

Resolution: wait for Retry-After, then retry the same request unchanged, same idempotency key, so the retry is dedup-safe. If one entity is persistently locked, something is holding it with slow work: check fibric receipts tail for a long-running action on that entity, and check whether your entity keys are broader than the real entity (for example, keying a whole store rather than one order serializes everything behind one lock). See Errors and Single-flight & idempotency.

Quota or rate limit exhausted

Symptom: requests return 429. Two different codes, two different problems:

CodeWhat is exhaustedResolution
rate_limited The per-minute request budget for a route class. Limits are per tenant, splitting traffic across API keys does not help. Back off per Retry-After and watch X-RateLimit-Remaining. Smooth bursts at the producer; for consumers, prefer one stream over tight polling loops.
quota_exceeded A standing quota: the monthly action allowance under a hard cap, or a concurrency cap (export jobs, active operators, installed connectors, API keys). For actions with a hard cap set: plans hold in proposed, nothing is lost; raise the cap in the console and approve the held plans, or enable overage ($0.01/action). For concurrency caps: finish or remove existing resources, or ask for a raise during onboarding. Budgets are tabulated in Rate limits & quotas.

Other frequent symptoms

SymptomCause and resolution
All requests return 401 The API key is malformed, deleted, or revoked (key_invalid, key_revoked). Run fibric auth whoami; issue a new key in the console if needed. Keys are shown once, a lost key is replaced, not recovered.
Cannot uninstall a connector connector_in_use: it is the only fulfiller of a capability an active operator depends on. Pause the dependent operators or bind another connector to the capability first.
Undo fails action_not_undoable: the tool declares no inverse, the action never applied, or it was already undone. Check the action's undoable and undone fields in the receipt first.
Lifecycle operation refused state_conflict: approving an already-executed plan, resuming a draft operator, testing a pending_auth connector. Fetch the resource and act on the state it is actually in.
Marketplace install refused listing_early_access: the listing is early access rather than live. Request access; early-access listings are provisioned with your team during onboarding.

If a symptom here does not match yours, the error reference covers every code the API returns, with retryability guidance, and Reliability and delivery semantics defines what the platform guarantees under failure. For anything that looks like a security issue, go straight to responsible disclosure.