1. Bot Identity
| Layer | Execution Execution |
|---|
| Bot class | Execution Utility |
|---|
| Authority | Reshape |
|---|
| Status | PLANNED |
|---|
| Readiness | Spec started |
|---|
| Runs before | Any exec bot that uses latency data for routing decisions |
|---|
| Runs after | Order submission and fill events from ws_user |
|---|
| Applies to | All CLOB V2 order submission and ws feed routes continuously |
|---|
| Default mode | shadow_only |
|---|
| User-visible | summary-only |
|---|
| Developer owner | Polytraders core — Execution pod |
|---|
Operational profile
| Modes supported | quarantine |
|---|
2. Purpose
LatencyProfiler continuously measures round-trip order submission latency by route and surfaces regressions. It probes each configured route at probe_interval_s and emits ObservationReports when p95 or p99 thresholds are breached.
3. Why This Bot Matters
Latency regression undetected
Strategy signals age past their TTL in transit, causing stale-signal discards and missed opportunities without a clear root cause.
Route not profiled per endpoint
A degraded CLOB endpoint continues to receive orders because the routing layer lacks per-route latency data.
WebSocket lag not tracked
ws_user fill events arrive late, causing order lifecycle state to be updated with significant delay.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|
| warn_p95_ms | 150 | 200 | 500 | p95 round-trip latency in milliseconds above which a WARN ObservationReport is emitted. |
| fail_p99_ms | 500 | 750 | 1000 | p99 round-trip latency in milliseconds above which HARD_REJECT is raised and the route is flagged as degraded. |
| probe_interval_s | 30 | 60 | 120 | How often to send a probe request to each configured route to measure latency. |
| routes_to_probe | ['clob_auth', 'ws_user'] | — | — | List of route identifiers to probe. Each entry corresponds to a configured CLOB V2 endpoint or WebSocket feed. |
7. Detailed Parameter Instructions
warn_p95_ms
What it means
p95 round-trip latency in milliseconds above which a WARN ObservationReport is emitted.
Default
{ "warn_p95_ms": 150 }
Why this default matters
150ms p95 is the target for acceptable order routing latency; above 200ms strategies begin experiencing signal-age issues.
Threshold logic
| Condition | Action |
|---|
| p95_ms <= 150 | No alert |
| 150 < p95_ms <= 200 | WARN — LATENCY_WARN emitted |
| p95_ms > 500 (hard) | HARD_REJECT — LATENCY_HARD_BREACH; alert fired |
Developer check
if p95 > params.warn_p95_ms: emit(LATENCY_WARN)
User-facing English
Exchange connection speed is being monitored.
fail_p99_ms
What it means
p99 round-trip latency in milliseconds above which HARD_REJECT is raised and the route is flagged as degraded.
Default
{ "fail_p99_ms": 500 }
Why this default matters
500ms p99 is the threshold at which GTD signal TTLs begin expiring in transit; above this, order submission must be suspended on the degraded route.
Threshold logic
| Condition | Action |
|---|
| p99_ms <= 500 | Healthy |
| 500 < p99_ms <= 750 | WARN — LATENCY_P99_ELEVATED |
| p99_ms > 1000 (hard) | HARD_REJECT — flag route degraded; notify exec bots |
Developer check
if p99 > params.fail_p99_ms: flagRoute(route, 'degraded')
User-facing English
— not yet authored —
probe_interval_s
What it means
How often to send a probe request to each configured route to measure latency.
Default
{ "probe_interval_s": 30 }
Why this default matters
30s provides frequent enough sampling to detect latency regressions within one minute while consuming minimal rate-limit budget.
Threshold logic
| Condition | Action |
|---|
| interval <= 30s | Normal probe cadence |
| interval > 60s | WARN — latency regressions may go undetected for > 1 minute |
| interval > 120s (hard) | Reject config |
Developer check
assert params.probe_interval_s <= params.hard
User-facing English
— not yet authored —
routes_to_probe
What it means
List of route identifiers to probe. Each entry corresponds to a configured CLOB V2 endpoint or WebSocket feed.
Default
{ "routes_to_probe": ["clob_auth", "ws_user"] }
Why this default matters
Probing both REST auth and WebSocket feeds captures the two most latency-sensitive paths for order execution.
Threshold logic
| Condition | Action |
|---|
| includes both clob_auth and ws_user | Full coverage |
| missing ws_user | WARN — WebSocket lag not monitored |
Developer check
if 'ws_user' not in params.routes_to_probe: emit(WARN)
User-facing English
— not yet authored —
8. Default Configuration
{
"bot_id": "exec.latencyprofiler",
"version": "0.1.0",
"mode": "shadow_only",
"defaults": {
"warn_p95_ms": 150,
"fail_p99_ms": 500,
"probe_interval_s": 30,
"routes_to_probe": [
"clob_auth",
"ws_user"
]
},
"locked": {
"warn_p95_ms": {
"max": 500
},
"fail_p99_ms": {
"max": 1000
},
"probe_interval_s": {
"max": 120
}
}
}
9. Implementation Flow
- Every probe_interval_s, for each route in routes_to_probe: send a probe request and record send_ms.
- For clob_auth: issue a lightweight GET /time or authenticated OPTIONS; record ack_ms.
- For ws_user: compare heartbeat ts_ms to local now_ms; record feed_lag_ms.
- Maintain a rolling window of the last 100 probe round-trip times per route.
- Compute p50, p95, p99 from the rolling window.
- If p95 > warn_p95_ms: emit ObservationReport(LATENCY_WARN) for the route.
- If p99 > fail_p99_ms: emit ObservationReport(LATENCY_HARD_BREACH); flag route as degraded in internal state store.
- Publish per-route latency histogram metrics every probe cycle.
10. Reference Implementation
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.
FUNCTION probeRoute(route):
sendMs = now_ms()
IF route == 'clob_auth':
result = clob_auth.GET('/time') // lightweight probe
ackMs = now_ms()
rtt = ackMs - sendMs
IF result IS NULL OR result.error:
rtt = 1000 // count as max latency
ELIF route == 'ws_user':
hb = ws_user.lastHeartbeat()
rtt = now_ms() - hb.ts_ms
// Update rolling window
windows[route].append(rtt)
IF len(windows[route]) > 100:
windows[route].pop(0)
// Compute percentiles
sorted_w = sorted(windows[route])
p50 = sorted_w[int(0.50 * len(sorted_w))]
p95 = sorted_w[int(0.95 * len(sorted_w))]
p99 = sorted_w[int(0.99 * len(sorted_w))]
// Threshold checks
IF p99 > params.fail_p99_ms:
routeState[route] = 'degraded'
EMIT ObservationReport(route, p50, p95, p99, LATENCY_HARD_BREACH)
ELIF p95 > params.warn_p95_ms:
EMIT ObservationReport(route, p50, p95, p99, LATENCY_WARN)
SCHEDULE probeRoute FOR EACH route IN params.routes_to_probe
EVERY params.probe_interval_s
SDK calls used
clob_auth.GET('/time')ws_user.lastHeartbeat()
Complexity: O(W log W) where W = rolling window size (100)
11. Wire Examples
Input — what arrives on the wire
Probe trigger (internal scheduler) — internal
{
"route": "clob_auth",
"trigger_ts_ms": 1746770300000
}
Output — what the bot emits
ObservationReport — LATENCY_WARN
{
"report_id": "rep_5e6f7a8b9c0d1e2f",
"bot_id": "exec.latencyprofiler",
"route": "clob_auth",
"p50_ms": 45,
"p95_ms": 160,
"p99_ms": 280,
"verdict": "LATENCY_WARN",
"measured_at_ms": 1746770300000
}
12. Decision Logic
APPROVE
p95 and p99 within thresholds; route healthy; no ObservationReport emitted.
RESHAPE_REQUIRED
Not applicable — LatencyProfiler is observation-only; it does not reshape orders.
REJECT
p99 exceeds fail_p99_ms; route flagged degraded; LATENCY_HARD_BREACH emitted.
WARNING_ONLY
p95 exceeds warn_p95_ms but p99 within threshold; LATENCY_WARN emitted.
13. Standard Decision Output
This bot returns a ObservationReport object. See ObservationReport schema.
{
"report_id": "rep_5e6f7a8b9c0d1e2f",
"trace_id": "trc_4d5e6f7a8b9c0d1e",
"bot_id": "exec.latencyprofiler",
"route": "clob_auth",
"p50_ms": 45,
"p95_ms": 160,
"p99_ms": 280,
"verdict": "LATENCY_WARN",
"window_size": 100,
"measured_at_ms": 1746770300000
}
14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|
LATENCY_OK | INFO | All probed routes within p95 and p99 thresholds. | No alert; emit metrics only. | |
LATENCY_WARN | WARN | p95 latency exceeded warn_p95_ms on a probed route. | Emit ObservationReport with WARN; do not block orders. | Exchange connection is slightly slower than normal. |
LATENCY_HARD_BREACH | HARD_REJECT | p99 latency exceeded fail_p99_ms; route flagged as degraded. | Flag route degraded; notify exec bots; alert ops. | Exchange connection has degraded. Order submission may be affected. |
PROBE_TIMEOUT | WARN | Probe request timed out; recorded as max latency (1000ms) in rolling window. | Record max latency; check for 3 consecutive timeouts before HARD_REJECT. | |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|
polytraders_exec_latencyprofiler_rtt_ms | histogram | ms | route | Round-trip latency histogram per probed route. |
polytraders_exec_latencyprofiler_degraded_routes | gauge | count | | Number of routes currently flagged as degraded. |
polytraders_exec_latencyprofiler_probe_errors_total | counter | count | route | Total probe timeouts or errors per route. |
Alerts
| Alert | Condition | Severity | Runbook |
|---|
LatencyProfilerRoutesDegraded | polytraders_exec_latencyprofiler_degraded_routes > 0 | P1 | #runbook-latencyprofiler-degraded |
LatencyProfilerHighP99 | histogram_quantile(0.99, rate(polytraders_exec_latencyprofiler_rtt_ms_bucket[5m])) > 500 | P2 | #runbook-latencyprofiler-p99 |
16. Developer Reporting
{
"route": "clob_auth",
"p50_ms": 45,
"p95_ms": 160,
"p99_ms": 280,
"warn_p95_ms": 150,
"fail_p99_ms": 500,
"samples": 100,
"route_degraded": false
}
17. Plain-English Reporting
| Situation | User-facing explanation |
|---|
| Latency warning on submission route | The connection to the exchange is slightly slower than normal. Orders may take a moment longer to be processed. |
| Route flagged degraded | The exchange connection speed has degraded significantly. Order submission may be suspended until conditions improve. |
18. Failure-Mode Block
| main_failure_mode | Probe requests consume rate-limit budget on a congested connection, making actual order submission slower. |
|---|
| false_positive_risk | A single slow probe response inflates p99, triggering LATENCY_HARD_BREACH when the route is actually healthy. |
|---|
| false_negative_risk | Rolling window too large (100 samples over 30s intervals) means a sudden latency spike takes up to 50 minutes to fully propagate through the p99 estimate. |
|---|
| safe_fallback | If probe itself times out, record as max latency (1000ms) in the rolling window; emit LATENCY_HARD_BREACH after 3 consecutive timeouts. |
|---|
| required_dependencies | clob_auth endpoint, ws_user heartbeat, internal scheduler for probe triggers |
|---|
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|
CLOB_AUTH_HIGH_LATENCY | Add 600ms artificial delay to clob_auth GET /time responses | | Delay removed; next probe cycle shows improved p99; route unflagged after 3 healthy probes |
WS_USER_HEARTBEAT_STALE | Stop ws_user heartbeat for 10s | | Heartbeat resumes; lag drops; route unflagged |
PROBE_RATE_LIMIT_EXHAUSTION | Reduce probe_interval_s to 1s and increase routes_to_probe to 10 entries | | Config corrected; probes resume at safe interval |
20. State & Persistence
Cold-start recovery
Window cleared on restart; first probe cycle rebuilds estimates from scratch.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|
| Execution model | scheduled coroutine per route |
| Max in-flight | 10 |
| Idempotency key | route + probe_trigger_ts_ms |
| Per-call timeout (ms) | 1000 |
| Backpressure strategy | Drop probe if previous probe for same route still in flight |
| Locking / mutual exclusion | per-route mutex for rolling window writes |
22. Dependencies
Depends on (must run first)
| Bot | Why | Contract |
|---|
internal.scheduler | Provides probe triggers every probe_interval_s. | Probe fires within ±5s of scheduled interval. |
Emits to (downstream consumers)
| Bot | Why | Contract |
|---|
| exec.orderlifecyclemanager | Degraded route flags inform lifecycle manager to escalate stuck-order thresholds. | ObservationReport with route_degraded=true consumed by exec bots. |
External services
| Service | Endpoint | SLA assumed | On failure |
|---|
| CLOB V2 auth API | https://clob.polymarket.com | 99.95% / 200ms p99 | Probe timeout counted as 1000ms in rolling window. |
| WS user feed | wss://ws-subscriptions-clob.polymarket.com/ws/user | best-effort | If heartbeat absent > 5s, feed_lag recorded as 5000ms. |
23. Security Surfaces
Abuse vectors considered
- Flooding probe scheduler to exhaust rate-limit budget with unnecessary latency checks
- Injecting fake degraded-route state to suppress order submission on healthy routes
Mitigations
- Probe rate capped at 1/probe_interval_s per route; scheduler enforces minimum interval
- Route degraded state writable only by LatencyProfiler process; read by other exec bots via internal read-only API
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|
| CLOB version | v2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | no |
| Aware of negative-risk markets | no |
| Multi-chain ready | no |
| SDK used | py-clob-client-v2 |
| Settlement contract | CTFExchangeV2 |
| Notes | LatencyProfiler probes CLOB V2 auth endpoint latency only; it does not sign or submit real orders. All measurements are in milliseconds from the local system clock. |
API surfaces declared
clob_authws_userinternal
Networks supported
polygon
25. Versioning & Migration
| Field | Value |
|---|
| spec | 2.0.0 |
| implementation | 0.1.0 |
| schema | 2 |
| released | None |
| planned_release | Q4-2026 |
Migration history
| Date | From | To | Reason | Action taken |
|---|
| 2026-04-28 | n/a | v2-spec | Spec drafted post-CLOB-V2 cutover; bot not yet implemented | Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain) |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|
| p95 computation from rolling window | Inject 100 samples with 95th sample = 180ms | p95_ms=180 > warn_p95_ms=150; LATENCY_WARN emitted |
| Route flagged degraded when p99 > fail_p99_ms | p99=600ms, fail_p99_ms=500 | route_degraded=true; LATENCY_HARD_BREACH emitted |
| No alert when both p95 and p99 within thresholds | p95=100ms, p99=200ms | No ObservationReport emitted |
Integration Tests
| Test | Expected result |
|---|
| Probe cycle: send probe → receive ack → compute latency → update rolling window | Rolling window updated; metrics emitted; alert fired only if threshold breached |
| ws_user lag detection via heartbeat comparison | feed_lag_ms computed; LATENCY_WARN if lag > warn_p95_ms |
Property Tests
| Property | Required behaviour |
|---|
| Rolling window always contains <= 100 samples per route | Always true — oldest sample evicted on overflow |
| p99 >= p95 >= p50 always holds | Always true |
27. Operational Runbook
LatencyProfiler incidents are always route degradations. Check CLOB status page and ws_user heartbeat freshness first.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|
LatencyProfilerRoutesDegraded | Check Polymarket status page; check CLOB auth endpoint health. If degraded, pause order submission until route recovers. | | | Infra on-call if CLOB unreachable > 2 min |
LatencyProfilerHighP99 | Check p99 histogram by route; identify which route is degraded. Cross-reference with ExchangeStatusMonitor. | | | Exec pod lead if p99 > 750ms sustained |
Manual overrides
polytraders bot unflag-route exec.latencyprofiler --route clob_auth — Route was incorrectly flagged degraded due to a probe anomaly; confirm route is healthy first.
Healthcheck
GET /internal/health/latencyprofiler -> 200 if All probed routes healthy, degraded_routes=0, p99 < fail_p99_ms on all routes. Red: degraded_routes > 0, probe_errors_total spiking, scheduler not firing.
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |