1. Bot Identity
| Layer | Execution Execution |
|---|
| Bot class | Execution Utility |
|---|
| Authority | Reshape |
|---|
| Status | PLANNED |
|---|
| Readiness | Spec started |
|---|
| Runs before | All exec bots that submit orders |
|---|
| Runs after | Continuous background process; does not depend on order flow |
|---|
| Applies to | All active trading while Polymarket CLOB V2 is the execution venue |
|---|
| Default mode | shadow_only |
|---|
| User-visible | summary-only |
|---|
| Developer owner | Polytraders core — Execution pod |
|---|
Operational profile
| Modes supported | quarantine |
|---|
2. Purpose
ExchangeStatusMonitor treats Polymarket itself as a degradable dependency. It polls CLOB V2 endpoint health, watches for reject-rate spikes, and parses public maintenance signals. When degradation is confirmed, it emits ObservationReports that trigger pause or de-risk actions across the exec layer.
3. Why This Bot Matters
Exchange degradation not detected
Orders continue to be submitted to a degraded CLOB, accumulating 429 errors, failed acks, and stale fills.
Maintenance window missed
Orders submitted during a scheduled maintenance window are rejected without useful error context, causing unnecessary retries.
Resume too early after incident
Resuming order submission before the CLOB has fully recovered causes a second wave of errors and potentially double-fills on retry.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|
| pause_on_status | ['degraded', 'maintenance'] | — | — | List of exchange status codes on which to emit PAUSE ObservationReport to suspend order submission. |
| flatten_on_status | ['outage'] | — | — | List of exchange status codes on which to emit FLATTEN ObservationReport, requesting all open orders to be cancelled. |
| poll_interval_s | 15 | 30 | 60 | How often to poll the CLOB health endpoint and status page. |
| resume_quarantine_min | 5 | 2 | 1 | Minutes to wait after exchange status returns to healthy before lifting the pause, to prevent false resumption. |
7. Detailed Parameter Instructions
pause_on_status
What it means
List of exchange status codes on which to emit PAUSE ObservationReport to suspend order submission.
Default
{ "pause_on_status": ["degraded", "maintenance"] }
Why this default matters
Pausing on 'degraded' and 'maintenance' covers the two most common exchange unavailability scenarios.
Threshold logic
| Condition | Action |
|---|
| status NOT IN pause_on_status | No pause — continue order submission |
| status IN pause_on_status | Emit EXCHANGE_STATUS_PAUSE ObservationReport |
Developer check
if status in params.pause_on_status: emit(EXCHANGE_STATUS_PAUSE)
User-facing English
Trading has been paused because the exchange is temporarily unavailable.
flatten_on_status
What it means
List of exchange status codes on which to emit FLATTEN ObservationReport, requesting all open orders to be cancelled.
Default
{ "flatten_on_status": ["outage"] }
Why this default matters
A full outage means orders will not be filled or cancelled by the exchange; the safest response is to cancel all open orders.
Threshold logic
| Condition | Action |
|---|
| status NOT IN flatten_on_status | No flatten |
| status IN flatten_on_status | Emit EXCHANGE_STATUS_FLATTEN ObservationReport |
Developer check
if status in params.flatten_on_status: emit(EXCHANGE_STATUS_FLATTEN)
User-facing English
Your open orders have been cancelled because the exchange is experiencing an outage.
poll_interval_s
What it means
How often to poll the CLOB health endpoint and status page.
Default
{ "poll_interval_s": 15 }
Why this default matters
15s provides timely detection of degradation; polling faster increases request overhead on an already-stressed exchange.
Threshold logic
| Condition | Action |
|---|
| interval <= 15s | Normal polling |
| interval > 30s | WARN — degradation detection latency increased |
| interval > 60s (hard) | Reject config |
Developer check
assert params.poll_interval_s <= params.hard
User-facing English
Exchange availability is checked regularly.
resume_quarantine_min
What it means
Minutes to wait after exchange status returns to healthy before lifting the pause, to prevent false resumption.
Default
{ "resume_quarantine_min": 5 }
Why this default matters
A 5-minute quarantine absorbs intermittent recovery signals; CLOB incidents often have brief healthy periods before full recovery.
Threshold logic
| Condition | Action |
|---|
| quarantine_min >= 5 | Normal; wait for full recovery |
| quarantine_min < 2 (warning) | WARN — may resume too early |
| quarantine_min < 1 (hard) | Reject config — minimum 1 min quarantine required |
Developer check
if params.resume_quarantine_min < params.hard: raise ConfigError
User-facing English
Trading will resume shortly after the exchange confirms it is fully operational.
8. Default Configuration
{
"bot_id": "exec.exchangestatusmonitor",
"version": "0.1.0",
"mode": "shadow_only",
"defaults": {
"pause_on_status": [
"degraded",
"maintenance"
],
"flatten_on_status": [
"outage"
],
"poll_interval_s": 15,
"resume_quarantine_min": 5
},
"locked": {
"poll_interval_s": {
"max": 60
},
"resume_quarantine_min": {
"min": 1
}
}
}
9. Implementation Flow
- Every poll_interval_s: GET clob_public /health; record status_code and latency.
- If status_code != 200 or latency > 2000ms: increment consecutive_error_count.
- After 3 consecutive errors: set exchange_status=degraded.
- Optionally: fetch Polymarket public status page; parse for maintenance or outage keywords.
- Read reject_rate from OrderLifecycleManager metrics; if reject_rate > 10% over 60s: treat as degraded signal.
- Determine composite status: healthy, degraded, maintenance, or outage.
- If status in pause_on_status: emit ObservationReport(EXCHANGE_STATUS_PAUSE).
- If status in flatten_on_status: emit ObservationReport(EXCHANGE_STATUS_FLATTEN).
- When status returns to healthy: start resume_quarantine_min timer; emit EXCHANGE_STATUS_RESUMING.
- After quarantine completes: emit EXCHANGE_STATUS_HEALTHY; clear pause state.
10. Reference Implementation
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.
STATE: consecutiveErrors = 0, exchangeStatus = 'healthy',
quarantineStartMs = None
FUNCTION pollExchange():
t0 = now_ms()
resp = clob_public.GET('/health', timeout=2000)
latency = now_ms() - t0
IF resp IS NULL OR resp.status_code != 200 OR latency > 2000:
consecutiveErrors += 1
ELSE:
consecutiveErrors = 0
// Reject-rate check
rejectRate = FETCH metrics.reject_rate_60s()
IF rejectRate > 0.10:
consecutiveErrors = max(consecutiveErrors, 3)
// Status determination
IF consecutiveErrors >= 3:
IF statusPage.contains('outage'):
exchangeStatus = 'outage'
ELSE:
exchangeStatus = 'degraded'
ELIF statusPage.contains('maintenance'):
exchangeStatus = 'maintenance'
ELSE:
exchangeStatus = 'healthy'
// Emit ObservationReport
IF exchangeStatus IN params.flatten_on_status:
EMIT ObservationReport(EXCHANGE_STATUS_FLATTEN)
ELIF exchangeStatus IN params.pause_on_status:
EMIT ObservationReport(EXCHANGE_STATUS_PAUSE)
ELIF exchangeStatus == 'healthy' AND quarantineStartMs IS None:
quarantineStartMs = now_ms()
EMIT ObservationReport(EXCHANGE_STATUS_RESUMING)
ELIF exchangeStatus == 'healthy':
IF now_ms() - quarantineStartMs >= params.resume_quarantine_min * 60000:
quarantineStartMs = None
EMIT ObservationReport(EXCHANGE_STATUS_HEALTHY)
SCHEDULE pollExchange EVERY params.poll_interval_s
SDK calls used
clob_public.GET('/health')statusPage.fetch('https://status.polymarket.com')
Complexity: O(1) per poll cycle
11. Wire Examples
Input — what arrives on the wire
Poll trigger + health response — internal scheduler + clob_public
{
"poll_ts_ms": 1746770400000,
"health_status_code": 503,
"latency_ms": 2100,
"consecutive_errors": 4
}
Output — what the bot emits
ObservationReport — EXCHANGE_STATUS_PAUSE
{
"report_id": "rep_6f7a8b9c0d1e2f3a",
"bot_id": "exec.exchangestatusmonitor",
"exchange_status": "degraded",
"verdict": "EXCHANGE_STATUS_PAUSE",
"consecutive_errors": 4,
"measured_at_ms": 1746770400000
}
12. Decision Logic
APPROVE
Exchange healthy; no action — continue normal order submission.
RESHAPE_REQUIRED
Not applicable — ExchangeStatusMonitor is observation-only.
REJECT
Status in flatten_on_status: emit FLATTEN ObservationReport; all open orders should be cancelled.
WARNING_ONLY
Consecutive error count rising but threshold not yet reached; WARN emitted.
13. Standard Decision Output
This bot returns a ObservationReport object. See ObservationReport schema.
{
"report_id": "rep_6f7a8b9c0d1e2f3a",
"trace_id": "trc_5e6f7a8b9c0d1e2f",
"bot_id": "exec.exchangestatusmonitor",
"exchange_status": "degraded",
"verdict": "EXCHANGE_STATUS_PAUSE",
"consecutive_errors": 4,
"reject_rate_pct": 15.2,
"measured_at_ms": 1746770400000
}
14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|
EXCHANGE_STATUS_HEALTHY | INFO | Exchange is healthy; quarantine completed; order submission permitted. | Clear pause state; resume normal operations. | |
EXCHANGE_STATUS_PAUSE | WARN | Exchange is degraded or in maintenance; order submission paused. | Emit ObservationReport; exec bots suspend new order submissions. | Trading is paused because the exchange is temporarily unavailable. |
EXCHANGE_STATUS_FLATTEN | HARD_REJECT | Exchange outage confirmed; all open orders should be cancelled. | Emit ObservationReport; trigger mass cancel. | Your orders were cancelled due to an exchange outage. |
EXCHANGE_STATUS_RESUMING | INFO | Exchange has recovered; quarantine period started. | Start resume_quarantine_min timer; do not resume submissions yet. | The exchange has recovered. Trading will resume shortly. |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|
polytraders_exec_exchangestatusmonitor_status | gauge | enum | status | Current exchange status (healthy=1, degraded=2, maintenance=3, outage=4). |
polytraders_exec_exchangestatusmonitor_consecutive_errors | gauge | count | | Current consecutive health check error count. |
polytraders_exec_exchangestatusmonitor_pause_events_total | counter | count | verdict | Total pause/flatten events emitted by verdict. |
Alerts
| Alert | Condition | Severity | Runbook |
|---|
ESMExchangePaused | polytraders_exec_exchangestatusmonitor_status > 1 | P1 | #runbook-esm-exchange-paused |
ESMHighConsecutiveErrors | polytraders_exec_exchangestatusmonitor_consecutive_errors >= 3 | P2 | #runbook-esm-consecutive-errors |
16. Developer Reporting
{
"exchange_status": "degraded",
"consecutive_errors": 4,
"reject_rate_pct": 15.2,
"status_page_parsed": true,
"status_page_result": "No active incident",
"verdict": "EXCHANGE_STATUS_PAUSE",
"quarantine_active": false
}
17. Plain-English Reporting
| Situation | User-facing explanation |
|---|
| Exchange paused | Trading has been temporarily paused because the exchange is experiencing technical issues. |
| Orders cancelled — outage | Your open orders were cancelled because the exchange had an outage. You can re-enter when trading resumes. |
| Trading resuming after quarantine | The exchange has recovered. Trading will resume shortly after a brief verification period. |
18. Failure-Mode Block
| main_failure_mode | Status-page parsing fails silently, causing ExchangeStatusMonitor to miss a scheduled maintenance window and continue submitting orders that will be rejected. |
|---|
| false_positive_risk | Brief network hiccup to the health endpoint counted as exchange degradation, pausing trading unnecessarily. |
|---|
| false_negative_risk | reject_rate threshold too high; exchange is degraded but local reject rate hasn't yet exceeded threshold, delaying pause. |
|---|
| safe_fallback | If health endpoint is unreachable for > 3 consecutive polls, treat as degraded; emit EXCHANGE_STATUS_PAUSE conservatively. |
|---|
| required_dependencies | clob_public /health endpoint, reject-rate metrics from OrderLifecycleManager, internal scheduler for poll triggers |
|---|
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|
CLOB_HEALTH_ENDPOINT_DOWN | Block TCP to clob_public /health for 3 poll cycles | | Health endpoint restored; consecutiveErrors=0; quarantine starts; EXCHANGE_STATUS_HEALTHY after resume_quarantine_min |
REJECT_RATE_SPIKE | Inject 15% reject rate into OrderLifecycleManager metrics for 60s | | Reject rate normalises; quarantine starts |
STATUS_PAGE_MAINTENANCE_WINDOW | Inject 'scheduled maintenance' keyword into status page mock | | Maintenance keyword removed from status page; status returns healthy after quarantine |
20. State & Persistence
Cold-start recovery
On restart, re-poll health immediately; treat cold start as 0 consecutive errors.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|
| Execution model | single-instance scheduled poller |
| Max in-flight | 1 |
| Idempotency key | poll_trigger_ts_ms |
| Per-call timeout (ms) | 2000 |
| Backpressure strategy | Drop poll if previous poll still in flight |
| Locking / mutual exclusion | single-writer: only ExchangeStatusMonitor writes to exchangeStatus store |
22. Dependencies
Depends on (must run first)
Emits to (downstream consumers)
| Bot | Why | Contract |
|---|
| exec.orderlifecyclemanager | EXCHANGE_STATUS_PAUSE/FLATTEN ObservationReports consumed to suspend order submission. | All exec bots subscribe to exchange status ObservationReports. |
External services
| Service | Endpoint | SLA assumed | On failure |
|---|
| CLOB V2 public API | https://clob.polymarket.com/health | best-effort (health endpoint) | Unreachable counts as consecutive error; 3 consecutive = degraded. |
| Polymarket status page | https://status.polymarket.com | best-effort | If unreachable, status_page_parsed=false; rely on health endpoint only. |
23. Security Surfaces
Abuse vectors considered
- Injecting fake health endpoint responses to trigger spurious exchange-pause events
- Flooding status-page parser with malformed HTML to suppress maintenance detection
Mitigations
- Health endpoint responses validated against expected schema; unexpected payloads treated as errors
- Status-page parsing uses keyword matching with a known-safe whitelist; malformed pages treated as 'no incident'
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|
| CLOB version | v2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | no |
| Aware of negative-risk markets | no |
| Multi-chain ready | no |
| SDK used | py-clob-client-v2 |
| Settlement contract | CTFExchangeV2 |
| Notes | ExchangeStatusMonitor polls CLOB V2 public endpoints only; it does not submit orders. It provides exchange health signals to other exec bots to coordinate pause/resume behaviour. |
API surfaces declared
clob_publicinternal
Networks supported
polygon
25. Versioning & Migration
| Field | Value |
|---|
| spec | 2.0.0 |
| implementation | 0.1.0 |
| schema | 2 |
| released | None |
| planned_release | Q4-2026 |
Migration history
| Date | From | To | Reason | Action taken |
|---|
| 2026-04-28 | n/a | v2-spec | Spec drafted post-CLOB-V2 cutover; bot not yet implemented | Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain) |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|
| Pause emitted after 3 consecutive health check failures | Inject 3 consecutive 503 responses from clob_public | exchange_status=degraded; EXCHANGE_STATUS_PAUSE emitted |
| Flatten emitted on outage status | Set exchange_status=outage | EXCHANGE_STATUS_FLATTEN emitted |
| No pause on single health check failure | Inject 1 503 response; next 2 succeed | No pause; WARN only after 1st failure |
Integration Tests
| Test | Expected result |
|---|
| Quarantine: status recovers → quarantine timer starts → EXCHANGE_STATUS_HEALTHY after resume_quarantine_min | EXCHANGE_STATUS_RESUMING emitted; EXCHANGE_STATUS_HEALTHY emitted after 5 min |
| reject_rate spike triggers degraded status | reject_rate > 10% for 60s → exchange_status=degraded; EXCHANGE_STATUS_PAUSE emitted |
Property Tests
| Property | Required behaviour |
|---|
| EXCHANGE_STATUS_FLATTEN only emitted when status in flatten_on_status | Always true |
| After quarantine completes, at least resume_quarantine_min minutes have elapsed since last error | Always true |
27. Operational Runbook
ExchangeStatusMonitor incidents require checking the Polymarket status page and CLOB health endpoint. Never manually lift a pause without confirming exchange health.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|
ESMExchangePaused | Check https://status.polymarket.com and CLOB /health directly. If false positive (exchange is healthy), unflag manually after confirming. | | | Exec pod lead + Infra |
ESMHighConsecutiveErrors | Check network connectivity to clob.polymarket.com; check CLOB latency dashboard. | | | Infra on-call if connectivity issue |
Manual overrides
polytraders bot force-resume exec.exchangestatusmonitor — Exchange is confirmed healthy but quarantine has not yet expired; use only with Exec pod lead approval.
Healthcheck
GET /internal/health/exchangestatusmonitor -> 200 if exchange_status=healthy, consecutive_errors=0, quarantine_active=false, polling running. Red: exchange_status in (degraded, outage), consecutive_errors >= 3, polling interval missed.
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |