Polytraders Dev Guide
internal
v3 spine Phase 1 · Shared contracts 9 demo-wired · 0 shadow-ready · 0 production-live · 100 pending · 109 total 15/33 infra tasks the plan status board
HomeBy LayerExecution2.11 ExchangeStatusMonitor

2.11 ExchangeStatusMonitor

Execution Execution Utility Reshape PLANNED Spec started capital · Direct P5 · Execution rails pending stub

ExchangeStatusMonitor treats Polymarket itself as a degradable dependency. It polls CLOB V2 endpoint health, watches for reject-rate spikes, and parses public maintenance signals. When degradation is confirmed, it emits ObservationReports that trigger pause or de-risk actions across the exec layer.

v3 readiness

Docs27/27
donehow scored
Impl0/15
pendinghow scored
Backtest0/4
pendinghow scored
Runtime0/8
pendinghow scored

A bot is done when all four scores are. What does done mean?

1. Bot Identity

LayerExecution  Execution
Bot classExecution Utility
AuthorityReshape
StatusPLANNED
ReadinessSpec started
Runs beforeAll exec bots that submit orders
Runs afterContinuous background process; does not depend on order flow
Applies toAll active trading while Polymarket CLOB V2 is the execution venue
Default modeshadow_only
User-visiblesummary-only
Developer ownerPolytraders core — Execution pod

Operational profile

Modes supportedquarantine

2. Purpose

ExchangeStatusMonitor treats Polymarket itself as a degradable dependency. It polls CLOB V2 endpoint health, watches for reject-rate spikes, and parses public maintenance signals. When degradation is confirmed, it emits ObservationReports that trigger pause or de-risk actions across the exec layer.

3. Why This Bot Matters

  • Exchange degradation not detected

    Orders continue to be submitted to a degraded CLOB, accumulating 429 errors, failed acks, and stale fills.

  • Maintenance window missed

    Orders submitted during a scheduled maintenance window are rejected without useful error context, causing unnecessary retries.

  • Resume too early after incident

    Resuming order submission before the CLOB has fully recovered causes a second wave of errors and potentially double-fills on retry.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

InputSourceRequired?Use
CLOB V2 health/ping endpointclob_publicYesDetect CLOB REST endpoint availability and error rates.
Polymarket public status pageinternal HTTP pollerNoParse maintenance-window announcements and incident notices.

5. Required Internal Inputs

InputSourceRequired?Use
Reject-rate metrics from OrderLifecycleManagerexec.orderlifecyclemanagerYesDetect reject-rate spikes (429 or 503 responses) as an exchange degradation signal.

6. Parameter Guide

ParameterDefaultWarningHardWhat it controls
pause_on_status['degraded', 'maintenance']List of exchange status codes on which to emit PAUSE ObservationReport to suspend order submission.
flatten_on_status['outage']List of exchange status codes on which to emit FLATTEN ObservationReport, requesting all open orders to be cancelled.
poll_interval_s153060How often to poll the CLOB health endpoint and status page.
resume_quarantine_min521Minutes to wait after exchange status returns to healthy before lifting the pause, to prevent false resumption.

7. Detailed Parameter Instructions

pause_on_status

What it means

List of exchange status codes on which to emit PAUSE ObservationReport to suspend order submission.

Default

{ "pause_on_status": ["degraded", "maintenance"] }

Why this default matters

Pausing on 'degraded' and 'maintenance' covers the two most common exchange unavailability scenarios.

Threshold logic

ConditionAction
status NOT IN pause_on_statusNo pause — continue order submission
status IN pause_on_statusEmit EXCHANGE_STATUS_PAUSE ObservationReport

Developer check

if status in params.pause_on_status: emit(EXCHANGE_STATUS_PAUSE)

User-facing English

Trading has been paused because the exchange is temporarily unavailable.

flatten_on_status

What it means

List of exchange status codes on which to emit FLATTEN ObservationReport, requesting all open orders to be cancelled.

Default

{ "flatten_on_status": ["outage"] }

Why this default matters

A full outage means orders will not be filled or cancelled by the exchange; the safest response is to cancel all open orders.

Threshold logic

ConditionAction
status NOT IN flatten_on_statusNo flatten
status IN flatten_on_statusEmit EXCHANGE_STATUS_FLATTEN ObservationReport

Developer check

if status in params.flatten_on_status: emit(EXCHANGE_STATUS_FLATTEN)

User-facing English

Your open orders have been cancelled because the exchange is experiencing an outage.

poll_interval_s

What it means

How often to poll the CLOB health endpoint and status page.

Default

{ "poll_interval_s": 15 }

Why this default matters

15s provides timely detection of degradation; polling faster increases request overhead on an already-stressed exchange.

Threshold logic

ConditionAction
interval <= 15sNormal polling
interval > 30sWARN — degradation detection latency increased
interval > 60s (hard)Reject config

Developer check

assert params.poll_interval_s <= params.hard

User-facing English

Exchange availability is checked regularly.

resume_quarantine_min

What it means

Minutes to wait after exchange status returns to healthy before lifting the pause, to prevent false resumption.

Default

{ "resume_quarantine_min": 5 }

Why this default matters

A 5-minute quarantine absorbs intermittent recovery signals; CLOB incidents often have brief healthy periods before full recovery.

Threshold logic

ConditionAction
quarantine_min >= 5Normal; wait for full recovery
quarantine_min < 2 (warning)WARN — may resume too early
quarantine_min < 1 (hard)Reject config — minimum 1 min quarantine required

Developer check

if params.resume_quarantine_min < params.hard: raise ConfigError

User-facing English

Trading will resume shortly after the exchange confirms it is fully operational.

8. Default Configuration

{
  "bot_id": "exec.exchangestatusmonitor",
  "version": "0.1.0",
  "mode": "shadow_only",
  "defaults": {
    "pause_on_status": [
      "degraded",
      "maintenance"
    ],
    "flatten_on_status": [
      "outage"
    ],
    "poll_interval_s": 15,
    "resume_quarantine_min": 5
  },
  "locked": {
    "poll_interval_s": {
      "max": 60
    },
    "resume_quarantine_min": {
      "min": 1
    }
  }
}

9. Implementation Flow

  1. Every poll_interval_s: GET clob_public /health; record status_code and latency.
  2. If status_code != 200 or latency > 2000ms: increment consecutive_error_count.
  3. After 3 consecutive errors: set exchange_status=degraded.
  4. Optionally: fetch Polymarket public status page; parse for maintenance or outage keywords.
  5. Read reject_rate from OrderLifecycleManager metrics; if reject_rate > 10% over 60s: treat as degraded signal.
  6. Determine composite status: healthy, degraded, maintenance, or outage.
  7. If status in pause_on_status: emit ObservationReport(EXCHANGE_STATUS_PAUSE).
  8. If status in flatten_on_status: emit ObservationReport(EXCHANGE_STATUS_FLATTEN).
  9. When status returns to healthy: start resume_quarantine_min timer; emit EXCHANGE_STATUS_RESUMING.
  10. After quarantine completes: emit EXCHANGE_STATUS_HEALTHY; clear pause state.

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

STATE: consecutiveErrors = 0, exchangeStatus = 'healthy',
       quarantineStartMs = None

FUNCTION pollExchange():
  t0 = now_ms()
  resp = clob_public.GET('/health', timeout=2000)
  latency = now_ms() - t0

  IF resp IS NULL OR resp.status_code != 200 OR latency > 2000:
    consecutiveErrors += 1
  ELSE:
    consecutiveErrors = 0

  // Reject-rate check
  rejectRate = FETCH metrics.reject_rate_60s()
  IF rejectRate > 0.10:
    consecutiveErrors = max(consecutiveErrors, 3)

  // Status determination
  IF consecutiveErrors >= 3:
    IF statusPage.contains('outage'):
      exchangeStatus = 'outage'
    ELSE:
      exchangeStatus = 'degraded'
  ELIF statusPage.contains('maintenance'):
    exchangeStatus = 'maintenance'
  ELSE:
    exchangeStatus = 'healthy'

  // Emit ObservationReport
  IF exchangeStatus IN params.flatten_on_status:
    EMIT ObservationReport(EXCHANGE_STATUS_FLATTEN)
  ELIF exchangeStatus IN params.pause_on_status:
    EMIT ObservationReport(EXCHANGE_STATUS_PAUSE)
  ELIF exchangeStatus == 'healthy' AND quarantineStartMs IS None:
    quarantineStartMs = now_ms()
    EMIT ObservationReport(EXCHANGE_STATUS_RESUMING)
  ELIF exchangeStatus == 'healthy':
    IF now_ms() - quarantineStartMs >= params.resume_quarantine_min * 60000:
      quarantineStartMs = None
      EMIT ObservationReport(EXCHANGE_STATUS_HEALTHY)

SCHEDULE pollExchange EVERY params.poll_interval_s

SDK calls used

  • clob_public.GET('/health')
  • statusPage.fetch('https://status.polymarket.com')

Complexity: O(1) per poll cycle

11. Wire Examples

Input — what arrives on the wire

Poll trigger + health responseinternal scheduler + clob_public

{
  "poll_ts_ms": 1746770400000,
  "health_status_code": 503,
  "latency_ms": 2100,
  "consecutive_errors": 4
}

Output — what the bot emits

ObservationReport — EXCHANGE_STATUS_PAUSE

{
  "report_id": "rep_6f7a8b9c0d1e2f3a",
  "bot_id": "exec.exchangestatusmonitor",
  "exchange_status": "degraded",
  "verdict": "EXCHANGE_STATUS_PAUSE",
  "consecutive_errors": 4,
  "measured_at_ms": 1746770400000
}

12. Decision Logic

APPROVE

Exchange healthy; no action — continue normal order submission.

RESHAPE_REQUIRED

Not applicable — ExchangeStatusMonitor is observation-only.

REJECT

Status in flatten_on_status: emit FLATTEN ObservationReport; all open orders should be cancelled.

WARNING_ONLY

Consecutive error count rising but threshold not yet reached; WARN emitted.

13. Standard Decision Output

This bot returns a ObservationReport object. See ObservationReport schema.

{
  "report_id": "rep_6f7a8b9c0d1e2f3a",
  "trace_id": "trc_5e6f7a8b9c0d1e2f",
  "bot_id": "exec.exchangestatusmonitor",
  "exchange_status": "degraded",
  "verdict": "EXCHANGE_STATUS_PAUSE",
  "consecutive_errors": 4,
  "reject_rate_pct": 15.2,
  "measured_at_ms": 1746770400000
}

14. Reason Codes

CodeSeverityMeaningActionUser-facing message
EXCHANGE_STATUS_HEALTHYINFOExchange is healthy; quarantine completed; order submission permitted.Clear pause state; resume normal operations.
EXCHANGE_STATUS_PAUSEWARNExchange is degraded or in maintenance; order submission paused.Emit ObservationReport; exec bots suspend new order submissions.Trading is paused because the exchange is temporarily unavailable.
EXCHANGE_STATUS_FLATTENHARD_REJECTExchange outage confirmed; all open orders should be cancelled.Emit ObservationReport; trigger mass cancel.Your orders were cancelled due to an exchange outage.
EXCHANGE_STATUS_RESUMINGINFOExchange has recovered; quarantine period started.Start resume_quarantine_min timer; do not resume submissions yet.The exchange has recovered. Trading will resume shortly.

15. Metrics & Logs

Metrics emitted

MetricTypeUnitLabelsMeaning
polytraders_exec_exchangestatusmonitor_statusgaugeenumstatusCurrent exchange status (healthy=1, degraded=2, maintenance=3, outage=4).
polytraders_exec_exchangestatusmonitor_consecutive_errorsgaugecountCurrent consecutive health check error count.
polytraders_exec_exchangestatusmonitor_pause_events_totalcountercountverdictTotal pause/flatten events emitted by verdict.

Alerts

AlertConditionSeverityRunbook
ESMExchangePausedpolytraders_exec_exchangestatusmonitor_status > 1P1#runbook-esm-exchange-paused
ESMHighConsecutiveErrorspolytraders_exec_exchangestatusmonitor_consecutive_errors >= 3P2#runbook-esm-consecutive-errors

16. Developer Reporting

{
  "exchange_status": "degraded",
  "consecutive_errors": 4,
  "reject_rate_pct": 15.2,
  "status_page_parsed": true,
  "status_page_result": "No active incident",
  "verdict": "EXCHANGE_STATUS_PAUSE",
  "quarantine_active": false
}

17. Plain-English Reporting

SituationUser-facing explanation
Exchange pausedTrading has been temporarily paused because the exchange is experiencing technical issues.
Orders cancelled — outageYour open orders were cancelled because the exchange had an outage. You can re-enter when trading resumes.
Trading resuming after quarantineThe exchange has recovered. Trading will resume shortly after a brief verification period.

18. Failure-Mode Block

main_failure_modeStatus-page parsing fails silently, causing ExchangeStatusMonitor to miss a scheduled maintenance window and continue submitting orders that will be rejected.
false_positive_riskBrief network hiccup to the health endpoint counted as exchange degradation, pausing trading unnecessarily.
false_negative_riskreject_rate threshold too high; exchange is degraded but local reject rate hasn't yet exceeded threshold, delaying pause.
safe_fallbackIf health endpoint is unreachable for > 3 consecutive polls, treat as degraded; emit EXCHANGE_STATUS_PAUSE conservatively.
required_dependenciesclob_public /health endpoint, reject-rate metrics from OrderLifecycleManager, internal scheduler for poll triggers

19. Failure-Injection Recipes

ScenarioHow to injectExpected behaviourRecovery
CLOB_HEALTH_ENDPOINT_DOWNBlock TCP to clob_public /health for 3 poll cyclesHealth endpoint restored; consecutiveErrors=0; quarantine starts; EXCHANGE_STATUS_HEALTHY after resume_quarantine_min
REJECT_RATE_SPIKEInject 15% reject rate into OrderLifecycleManager metrics for 60sReject rate normalises; quarantine starts
STATUS_PAGE_MAINTENANCE_WINDOWInject 'scheduled maintenance' keyword into status page mockMaintenance keyword removed from status page; status returns healthy after quarantine

20. State & Persistence

Cold-start recovery

On restart, re-poll health immediately; treat cold start as 0 consecutive errors.

21. Concurrency & Idempotency

AspectSpecification
Execution modelsingle-instance scheduled poller
Max in-flight1
Idempotency keypoll_trigger_ts_ms
Per-call timeout (ms)2000
Backpressure strategyDrop poll if previous poll still in flight
Locking / mutual exclusionsingle-writer: only ExchangeStatusMonitor writes to exchangeStatus store

22. Dependencies

Depends on (must run first)

BotWhyContract
exec.orderlifecyclemanagerProvides reject-rate metrics as a secondary degradation signal.reject_rate_60s metric published by OrderLifecycleManager.

Emits to (downstream consumers)

BotWhyContract
exec.orderlifecyclemanagerEXCHANGE_STATUS_PAUSE/FLATTEN ObservationReports consumed to suspend order submission.All exec bots subscribe to exchange status ObservationReports.

External services

ServiceEndpointSLA assumedOn failure
CLOB V2 public APIhttps://clob.polymarket.com/healthbest-effort (health endpoint)Unreachable counts as consecutive error; 3 consecutive = degraded.
Polymarket status pagehttps://status.polymarket.combest-effortIf unreachable, status_page_parsed=false; rely on health endpoint only.

23. Security Surfaces

Abuse vectors considered

  • Injecting fake health endpoint responses to trigger spurious exchange-pause events
  • Flooding status-page parser with malformed HTML to suppress maintenance detection

Mitigations

  • Health endpoint responses validated against expected schema; unexpected payloads treated as errors
  • Status-page parsing uses keyword matching with a known-safe whitelist; malformed pages treated as 'no incident'

24. Polymarket V2 Compatibility

AspectValue
CLOB versionv2
Collateral assetpUSD
EIP-712 Exchange domain version2
Aware of builderCode fieldno
Aware of negative-risk marketsno
Multi-chain readyno
SDK usedpy-clob-client-v2
Settlement contractCTFExchangeV2
NotesExchangeStatusMonitor polls CLOB V2 public endpoints only; it does not submit orders. It provides exchange health signals to other exec bots to coordinate pause/resume behaviour.

API surfaces declared

clob_publicinternal

Networks supported

polygon

25. Versioning & Migration

FieldValue
spec2.0.0
implementation0.1.0
schema2
releasedNone
planned_releaseQ4-2026

Migration history

DateFromToReasonAction taken
2026-04-28n/av2-specSpec drafted post-CLOB-V2 cutover; bot not yet implementedDesigned against V2 schema (pUSD, builder codes, V2 EIP-712 domain)

26. Acceptance Tests

Unit Tests

TestSetupExpected result
Pause emitted after 3 consecutive health check failuresInject 3 consecutive 503 responses from clob_publicexchange_status=degraded; EXCHANGE_STATUS_PAUSE emitted
Flatten emitted on outage statusSet exchange_status=outageEXCHANGE_STATUS_FLATTEN emitted
No pause on single health check failureInject 1 503 response; next 2 succeedNo pause; WARN only after 1st failure

Integration Tests

TestExpected result
Quarantine: status recovers → quarantine timer starts → EXCHANGE_STATUS_HEALTHY after resume_quarantine_minEXCHANGE_STATUS_RESUMING emitted; EXCHANGE_STATUS_HEALTHY emitted after 5 min
reject_rate spike triggers degraded statusreject_rate > 10% for 60s → exchange_status=degraded; EXCHANGE_STATUS_PAUSE emitted

Property Tests

PropertyRequired behaviour
EXCHANGE_STATUS_FLATTEN only emitted when status in flatten_on_statusAlways true
After quarantine completes, at least resume_quarantine_min minutes have elapsed since last errorAlways true

27. Operational Runbook

ExchangeStatusMonitor incidents require checking the Polymarket status page and CLOB health endpoint. Never manually lift a pause without confirming exchange health.

On-call actions

AlertFirst stepDiagnosisMitigationEscalate to
ESMExchangePausedCheck https://status.polymarket.com and CLOB /health directly. If false positive (exchange is healthy), unflag manually after confirming.Exec pod lead + Infra
ESMHighConsecutiveErrorsCheck network connectivity to clob.polymarket.com; check CLOB latency dashboard.Infra on-call if connectivity issue

Manual overrides

  • polytraders bot force-resume exec.exchangestatusmonitor — Exchange is confirmed healthy but quarantine has not yet expired; use only with Exec pod lead approval.

Healthcheck

GET /internal/health/exchangestatusmonitor -> 200 if exchange_status=healthy, consecutive_errors=0, quarantine_active=false, polling running. Red: exchange_status in (degraded, outage), consecutive_errors >= 3, polling interval missed.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

GateHow measuredThreshold
Pause-on-3-consecutive-errors unit test passesCI test run100% pass

Promote to Limited live

GateHow measuredThreshold
Zero false-positive pause events over 48h shadow runCross-reference pause events with Polymarket incident logZero false positives

Promote to General live

GateHow measuredThreshold
Exchange degradation detected within 3 poll cycles during a real incidentPost-incident review: compare EXCHANGE_STATUS_PAUSE timestamp to CLOB incident start timeDetection within 3 × poll_interval_s

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

RequirementStatus
Purpose defined✓ done
Required inputs listed✓ done
Parameters defined✓ done
Defaults defined✓ done
Warning thresholds defined✓ done
Hard thresholds defined✓ done
Safe fallback defined✓ done
Structured output defined✓ done
Developer log defined✓ done
Plain-English explanation✓ done
Unit tests defined✓ done
Integration tests defined✓ done
Property tests defined✓ done
Failure-mode block complete✓ done
Reference implementation pseudocode✓ done
Wire examples (input + output)✓ done
Reason codes listed✓ done
Metrics & logs defined✓ done
State & persistence defined✓ done
Concurrency & idempotency defined✓ done
Dependencies declared✓ done
Security surfaces declared✓ done
Polymarket V2 compatibility declared✓ done
Version & migration history declared✓ done
Operational runbook defined✓ done
Promotion gates defined✓ done
Failure-injection recipes defined✓ done