Incident playbooks
If one of these alerts fires, follow the playbook. Do not improvise.
Kill switch fires automatically
Trigger: Drawdown breach, reject-rate spike, feed loss, or wallet-funding short.
- Confirm trigger source from
KillSwitchEvent.triggerfield. - Page the on-call risk engineer.
- Confirm no orders are in-flight; check Order Lifecycle Manager.
- If trigger was drawdown: publish a positions snapshot to the incident channel.
- Run
replay-last-houragainst the simulator to confirm KillSwitch fired correctly. - Reset only after the trigger condition has cleared and a second approver signs off.
Polymarket API degraded
Trigger: API Degradation Monitor emits GOV_API_DEGRADED at warning or hard.
- Switch all Strategy bots from
enforcedtoadvisory. - Cancel resting orders on markets we cannot read freshness for.
- Page the trading-ops on-call.
- Watch the rolling error rate; promote back to
enforcedonly after 10 minutes of green metrics.
On-chain reconcile mismatch
Trigger: Reconciler emits GOV_RECONCILE_MISMATCH.
- Halt all Trade-authority bots immediately (auto-handled by KillSwitch).
- Capture the exact divergence: internal state, on-chain state, last-seen block, last-applied event.
- Page the security on-call.
- Do not re-enable trading until the divergence is explained and the divergent state is reconciled by hand.
Config drift detected
Trigger: Config Drift Detector emits GOV_CONFIG_DRIFT.
- Identify the drifted bot and field.
- If the drift is unauthorised: revert to the approved config and audit who changed it.
- If the drift is intentional but not yet approved: hold the operator override active for the documented time bound, and open an approval PR before it expires.
Universal first three steps
- Confirm the alert is real (not a flapping metric).
- Capture state before doing anything: positions, open orders, last 100 ReportEnvelopes, on-chain block height.
- Open the incident channel and assign an incident commander before taking any remedial action.