Home › By Layer › Execution › 2.10 LatencyProfiler

2.10 LatencyProfiler

Execution Execution Utility Reshape PLANNED Spec started capital · Direct P5 · Execution rails ○ pending stub

LatencyProfiler continuously measures round-trip order submission latency by route and surfaces regressions. It probes each configured route at probe_interval_s and emits ObservationReports when p95 or p99 thresholds are breached.

v3 readiness

Docs27/27

donehow scored

Impl0/15

pendinghow scored

Backtest0/4

pendinghow scored

Runtime0/8

pendinghow scored

A bot is done when all four scores are. What does done mean?

← 2.9 CancelReplaceOptimizer 2.11 ExchangeStatusMonitor →

1. Bot Identity

Layer	Execution Execution
Bot class	Execution Utility
Authority	Reshape
Status	PLANNED
Readiness	Spec started
Runs before	Any exec bot that uses latency data for routing decisions
Runs after	Order submission and fill events from ws_user
Applies to	All CLOB V2 order submission and ws feed routes continuously
Default mode	`shadow_only`
User-visible	summary-only
Developer owner	Polytraders core — Execution pod

Operational profile

Modes supported	quarantine

2. Purpose

3. Why This Bot Matters

Latency regression undetected
Strategy signals age past their TTL in transit, causing stale-signal discards and missed opportunities without a clear root cause.
Route not profiled per endpoint
A degraded CLOB endpoint continues to receive orders because the routing layer lacks per-route latency data.
WebSocket lag not tracked
ws_user fill events arrive late, causing order lifecycle state to be updated with significant delay.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

Input	Source	Required?	Use
CLOB V2 order submission endpoint (probe orders)	`clob_auth`	Yes	Measure submit-to-ack latency per endpoint.
WebSocket user feed heartbeat	`ws_user`	Yes	Measure ws feed lag by comparing heartbeat timestamp to local clock.

5. Required Internal Inputs

Input	Source	Required?	Use
Probe trigger from scheduler	`internal scheduler`	Yes	Trigger a latency probe on each configured route every probe_interval_s.

6. Parameter Guide

Parameter	Default	Warning	Hard	What it controls
warn_p95_ms	`150`	`200`	`500`	p95 round-trip latency in milliseconds above which a WARN ObservationReport is emitted.
fail_p99_ms	`500`	`750`	`1000`	p99 round-trip latency in milliseconds above which HARD_REJECT is raised and the route is flagged as degraded.
probe_interval_s	`30`	`60`	`120`	How often to send a probe request to each configured route to measure latency.
routes_to_probe	`['clob_auth', 'ws_user']`	`—`	`—`	List of route identifiers to probe. Each entry corresponds to a configured CLOB V2 endpoint or WebSocket feed.

7. Detailed Parameter Instructions

warn_p95_ms

What it means

p95 round-trip latency in milliseconds above which a WARN ObservationReport is emitted.

Default

{ "warn_p95_ms": 150 }

Why this default matters

150ms p95 is the target for acceptable order routing latency; above 200ms strategies begin experiencing signal-age issues.

Threshold logic

Condition	Action
p95_ms <= 150	No alert
150 < p95_ms <= 200	WARN — LATENCY_WARN emitted
p95_ms > 500 (hard)	HARD_REJECT — LATENCY_HARD_BREACH; alert fired

Developer check

if p95 > params.warn_p95_ms: emit(LATENCY_WARN)

User-facing English

Exchange connection speed is being monitored.

fail_p99_ms

What it means

p99 round-trip latency in milliseconds above which HARD_REJECT is raised and the route is flagged as degraded.

Default

{ "fail_p99_ms": 500 }

Why this default matters

500ms p99 is the threshold at which GTD signal TTLs begin expiring in transit; above this, order submission must be suspended on the degraded route.

Threshold logic

Condition	Action
p99_ms <= 500	Healthy
500 < p99_ms <= 750	WARN — LATENCY_P99_ELEVATED
p99_ms > 1000 (hard)	HARD_REJECT — flag route degraded; notify exec bots

Developer check

if p99 > params.fail_p99_ms: flagRoute(route, 'degraded')

User-facing English

— not yet authored —

probe_interval_s

What it means

How often to send a probe request to each configured route to measure latency.

Default

{ "probe_interval_s": 30 }

Why this default matters

30s provides frequent enough sampling to detect latency regressions within one minute while consuming minimal rate-limit budget.

Threshold logic

Condition	Action
interval <= 30s	Normal probe cadence
interval > 60s	WARN — latency regressions may go undetected for > 1 minute
interval > 120s (hard)	Reject config

Developer check

assert params.probe_interval_s <= params.hard

User-facing English

— not yet authored —

routes_to_probe

What it means

List of route identifiers to probe. Each entry corresponds to a configured CLOB V2 endpoint or WebSocket feed.

Default

{ "routes_to_probe": ["clob_auth", "ws_user"] }

Why this default matters

Probing both REST auth and WebSocket feeds captures the two most latency-sensitive paths for order execution.

Threshold logic

Condition	Action
includes both clob_auth and ws_user	Full coverage
missing ws_user	WARN — WebSocket lag not monitored

Developer check

if 'ws_user' not in params.routes_to_probe: emit(WARN)

User-facing English

— not yet authored —

8. Default Configuration

{
  "bot_id": "exec.latencyprofiler",
  "version": "0.1.0",
  "mode": "shadow_only",
  "defaults": {
    "warn_p95_ms": 150,
    "fail_p99_ms": 500,
    "probe_interval_s": 30,
    "routes_to_probe": [
      "clob_auth",
      "ws_user"
    ]
  },
  "locked": {
    "warn_p95_ms": {
      "max": 500
    },
    "fail_p99_ms": {
      "max": 1000
    },
    "probe_interval_s": {
      "max": 120
    }
  }
}

9. Implementation Flow

Every probe_interval_s, for each route in routes_to_probe: send a probe request and record send_ms.
For clob_auth: issue a lightweight GET /time or authenticated OPTIONS; record ack_ms.
For ws_user: compare heartbeat ts_ms to local now_ms; record feed_lag_ms.
Maintain a rolling window of the last 100 probe round-trip times per route.
Compute p50, p95, p99 from the rolling window.
If p95 > warn_p95_ms: emit ObservationReport(LATENCY_WARN) for the route.
If p99 > fail_p99_ms: emit ObservationReport(LATENCY_HARD_BREACH); flag route as degraded in internal state store.
Publish per-route latency histogram metrics every probe cycle.

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

FUNCTION probeRoute(route):
  sendMs = now_ms()
  IF route == 'clob_auth':
    result = clob_auth.GET('/time')  // lightweight probe
    ackMs = now_ms()
    rtt = ackMs - sendMs
    IF result IS NULL OR result.error:
      rtt = 1000  // count as max latency
  ELIF route == 'ws_user':
    hb = ws_user.lastHeartbeat()
    rtt = now_ms() - hb.ts_ms

  // Update rolling window
  windows[route].append(rtt)
  IF len(windows[route]) > 100:
    windows[route].pop(0)

  // Compute percentiles
  sorted_w = sorted(windows[route])
  p50 = sorted_w[int(0.50 * len(sorted_w))]
  p95 = sorted_w[int(0.95 * len(sorted_w))]
  p99 = sorted_w[int(0.99 * len(sorted_w))]

  // Threshold checks
  IF p99 > params.fail_p99_ms:
    routeState[route] = 'degraded'
    EMIT ObservationReport(route, p50, p95, p99, LATENCY_HARD_BREACH)
  ELIF p95 > params.warn_p95_ms:
    EMIT ObservationReport(route, p50, p95, p99, LATENCY_WARN)

SCHEDULE probeRoute FOR EACH route IN params.routes_to_probe
         EVERY params.probe_interval_s

SDK calls used

clob_auth.GET('/time')
ws_user.lastHeartbeat()

Complexity: O(W log W) where W = rolling window size (100)

11. Wire Examples

Input — what arrives on the wire

Probe trigger (internal scheduler) — internal

{
  "route": "clob_auth",
  "trigger_ts_ms": 1746770300000
}

Output — what the bot emits

ObservationReport — LATENCY_WARN

{
  "report_id": "rep_5e6f7a8b9c0d1e2f",
  "bot_id": "exec.latencyprofiler",
  "route": "clob_auth",
  "p50_ms": 45,
  "p95_ms": 160,
  "p99_ms": 280,
  "verdict": "LATENCY_WARN",
  "measured_at_ms": 1746770300000
}

12. Decision Logic

APPROVE

p95 and p99 within thresholds; route healthy; no ObservationReport emitted.

RESHAPE_REQUIRED

Not applicable — LatencyProfiler is observation-only; it does not reshape orders.

REJECT

p99 exceeds fail_p99_ms; route flagged degraded; LATENCY_HARD_BREACH emitted.

WARNING_ONLY

p95 exceeds warn_p95_ms but p99 within threshold; LATENCY_WARN emitted.

13. Standard Decision Output

This bot returns a ObservationReport object. See ObservationReport schema.

{
  "report_id": "rep_5e6f7a8b9c0d1e2f",
  "trace_id": "trc_4d5e6f7a8b9c0d1e",
  "bot_id": "exec.latencyprofiler",
  "route": "clob_auth",
  "p50_ms": 45,
  "p95_ms": 160,
  "p99_ms": 280,
  "verdict": "LATENCY_WARN",
  "window_size": 100,
  "measured_at_ms": 1746770300000
}

14. Reason Codes

Code	Severity	Meaning	Action	User-facing message
`LATENCY_OK`	INFO	All probed routes within p95 and p99 thresholds.	No alert; emit metrics only.
`LATENCY_WARN`	WARN	p95 latency exceeded warn_p95_ms on a probed route.	Emit ObservationReport with WARN; do not block orders.	Exchange connection is slightly slower than normal.
`LATENCY_HARD_BREACH`	HARD_REJECT	p99 latency exceeded fail_p99_ms; route flagged as degraded.	Flag route degraded; notify exec bots; alert ops.	Exchange connection has degraded. Order submission may be affected.
`PROBE_TIMEOUT`	WARN	Probe request timed out; recorded as max latency (1000ms) in rolling window.	Record max latency; check for 3 consecutive timeouts before HARD_REJECT.

15. Metrics & Logs

Metrics emitted

Metric	Type	Unit	Labels	Meaning
`polytraders_exec_latencyprofiler_rtt_ms`	histogram	ms	route	Round-trip latency histogram per probed route.
`polytraders_exec_latencyprofiler_degraded_routes`	gauge	count		Number of routes currently flagged as degraded.
`polytraders_exec_latencyprofiler_probe_errors_total`	counter	count	route	Total probe timeouts or errors per route.

Alerts

Alert	Condition	Severity	Runbook
`LatencyProfilerRoutesDegraded`	`polytraders_exec_latencyprofiler_degraded_routes > 0`	P1	#runbook-latencyprofiler-degraded
`LatencyProfilerHighP99`	`histogram_quantile(0.99, rate(polytraders_exec_latencyprofiler_rtt_ms_bucket[5m])) > 500`	P2	#runbook-latencyprofiler-p99

16. Developer Reporting

{
  "route": "clob_auth",
  "p50_ms": 45,
  "p95_ms": 160,
  "p99_ms": 280,
  "warn_p95_ms": 150,
  "fail_p99_ms": 500,
  "samples": 100,
  "route_degraded": false
}

17. Plain-English Reporting

Situation	User-facing explanation
Latency warning on submission route	The connection to the exchange is slightly slower than normal. Orders may take a moment longer to be processed.
Route flagged degraded	The exchange connection speed has degraded significantly. Order submission may be suspended until conditions improve.

18. Failure-Mode Block

main_failure_mode	Probe requests consume rate-limit budget on a congested connection, making actual order submission slower.
false_positive_risk	A single slow probe response inflates p99, triggering LATENCY_HARD_BREACH when the route is actually healthy.
false_negative_risk	Rolling window too large (100 samples over 30s intervals) means a sudden latency spike takes up to 50 minutes to fully propagate through the p99 estimate.
safe_fallback	If probe itself times out, record as max latency (1000ms) in the rolling window; emit LATENCY_HARD_BREACH after 3 consecutive timeouts.
required_dependencies	clob_auth endpoint, ws_user heartbeat, internal scheduler for probe triggers

19. Failure-Injection Recipes

Scenario	How to inject	Recovery
`CLOB_AUTH_HIGH_LATENCY`	Add 600ms artificial delay to clob_auth GET /time responses	Delay removed; next probe cycle shows improved p99; route unflagged after 3 healthy probes
`WS_USER_HEARTBEAT_STALE`	Stop ws_user heartbeat for 10s	Heartbeat resumes; lag drops; route unflagged
`PROBE_RATE_LIMIT_EXHAUSTION`	Reduce probe_interval_s to 1s and increase routes_to_probe to 10 entries	Config corrected; probes resume at safe interval

20. State & Persistence

Cold-start recovery

Window cleared on restart; first probe cycle rebuilds estimates from scratch.

21. Concurrency & Idempotency

Aspect	Specification
Execution model	`scheduled coroutine per route`
Max in-flight	`10`
Idempotency key	`route + probe_trigger_ts_ms`
Per-call timeout (ms)	`1000`
Backpressure strategy	`Drop probe if previous probe for same route still in flight`
Locking / mutual exclusion	`per-route mutex for rolling window writes`

22. Dependencies

Depends on (must run first)

Bot	Why	Contract
`internal.scheduler`	Provides probe triggers every probe_interval_s.	Probe fires within ±5s of scheduled interval.

Emits to (downstream consumers)

Bot	Why	Contract
exec.orderlifecyclemanager	Degraded route flags inform lifecycle manager to escalate stuck-order thresholds.	ObservationReport with route_degraded=true consumed by exec bots.

External services

Service	Endpoint	SLA assumed	On failure
CLOB V2 auth API	https://clob.polymarket.com	99.95% / 200ms p99	Probe timeout counted as 1000ms in rolling window.
WS user feed	wss://ws-subscriptions-clob.polymarket.com/ws/user	best-effort	If heartbeat absent > 5s, feed_lag recorded as 5000ms.

23. Security Surfaces

Abuse vectors considered

Flooding probe scheduler to exhaust rate-limit budget with unnecessary latency checks
Injecting fake degraded-route state to suppress order submission on healthy routes

Mitigations

Probe rate capped at 1/probe_interval_s per route; scheduler enforces minimum interval
Route degraded state writable only by LatencyProfiler process; read by other exec bots via internal read-only API

24. Polymarket V2 Compatibility

Aspect	Value
CLOB version	`v2`
Collateral asset	`pUSD`
EIP-712 Exchange domain version	`2`
Aware of builderCode field	no
Aware of negative-risk markets	no
Multi-chain ready	no
SDK used	`py-clob-client-v2`
Settlement contract	`CTFExchangeV2`
Notes	`LatencyProfiler probes CLOB V2 auth endpoint latency only; it does not sign or submit real orders. All measurements are in milliseconds from the local system clock.`

API surfaces declared

clob_authws_userinternal

Networks supported

polygon

25. Versioning & Migration

Field	Value
spec	`2.0.0`
implementation	`0.1.0`
schema	`2`
released	`None`
planned_release	`Q4-2026`

Migration history

Date	From	To	Reason	Action taken
2026-04-28	n/a	v2-spec	Spec drafted post-CLOB-V2 cutover; bot not yet implemented	Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain)

26. Acceptance Tests

Unit Tests

Test	Setup	Expected result
p95 computation from rolling window	Inject 100 samples with 95th sample = 180ms	p95_ms=180 > warn_p95_ms=150; LATENCY_WARN emitted
Route flagged degraded when p99 > fail_p99_ms	p99=600ms, fail_p99_ms=500	route_degraded=true; LATENCY_HARD_BREACH emitted
No alert when both p95 and p99 within thresholds	p95=100ms, p99=200ms	No ObservationReport emitted

Integration Tests

Test	Expected result
Probe cycle: send probe → receive ack → compute latency → update rolling window	Rolling window updated; metrics emitted; alert fired only if threshold breached
ws_user lag detection via heartbeat comparison	feed_lag_ms computed; LATENCY_WARN if lag > warn_p95_ms

Property Tests

Property	Required behaviour
Rolling window always contains <= 100 samples per route	Always true — oldest sample evicted on overflow
p99 >= p95 >= p50 always holds	Always true

27. Operational Runbook

LatencyProfiler incidents are always route degradations. Check CLOB status page and ws_user heartbeat freshness first.

On-call actions

Alert	First step	Diagnosis	Mitigation	Escalate to
`LatencyProfilerRoutesDegraded`	Check Polymarket status page; check CLOB auth endpoint health. If degraded, pause order submission until route recovers.			Infra on-call if CLOB unreachable > 2 min
`LatencyProfilerHighP99`	Check p99 histogram by route; identify which route is degraded. Cross-reference with ExchangeStatusMonitor.			Exec pod lead if p99 > 750ms sustained

Manual overrides

polytraders bot unflag-route exec.latencyprofiler --route clob_auth — Route was incorrectly flagged degraded due to a probe anomaly; confirm route is healthy first.

Healthcheck

GET /internal/health/latencyprofiler -> 200 if All probed routes healthy, degraded_routes=0, p99 < fail_p99_ms on all routes. Red: degraded_routes > 0, probe_errors_total spiking, scheduler not firing.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

Gate	How measured	Threshold
p95/p99 computation unit tests pass with known input windows	CI test run	100% pass

Promote to Limited live

Gate	How measured	Threshold
No false-positive route-degraded flags over 48h shadow run	degraded_routes gauge cross-referenced with CLOB status page	Zero false positives

Promote to General live

Gate	How measured	Threshold
Latency breach detected within 2 probe cycles of actual CLOB degradation over 7-day limited-live	Correlation of LATENCY_HARD_BREACH events with CLOB incident log	Detection within 2 × probe_interval_s

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

Requirement	Status
Purpose defined	✓ done
Required inputs listed	✓ done
Parameters defined	✓ done
Defaults defined	✓ done
Warning thresholds defined	✓ done
Hard thresholds defined	✓ done
Safe fallback defined	✓ done
Structured output defined	✓ done
Developer log defined	✓ done
Plain-English explanation	✓ done
Unit tests defined	✓ done
Integration tests defined	✓ done
Property tests defined	✓ done
Failure-mode block complete	✓ done
Reference implementation pseudocode	✓ done
Wire examples (input + output)	✓ done
Reason codes listed	✓ done
Metrics & logs defined	✓ done
State & persistence defined	✓ done
Concurrency & idempotency defined	✓ done
Dependencies declared	✓ done
Security surfaces declared	✓ done
Polymarket V2 compatibility declared	✓ done
Version & migration history declared	✓ done
Operational runbook defined	✓ done
Promotion gates defined	✓ done
Failure-injection recipes defined	✓ done

2.10 LatencyProfiler

v3 readiness

1. Bot Identity

Operational profile

2. Purpose

3. Why This Bot Matters

Latency regression undetected

Route not profiled per endpoint

WebSocket lag not tracked

4. Required Polymarket Inputs

5. Required Internal Inputs

6. Parameter Guide

7. Detailed Parameter Instructions

warn_p95_ms

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

fail_p99_ms

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

probe_interval_s

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

routes_to_probe

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

8. Default Configuration

9. Implementation Flow

10. Reference Implementation

SDK calls used

11. Wire Examples

Input — what arrives on the wire

Output — what the bot emits

12. Decision Logic

APPROVE

RESHAPE_REQUIRED

REJECT

WARNING_ONLY

13. Standard Decision Output

14. Reason Codes

15. Metrics & Logs

Metrics emitted

Alerts

16. Developer Reporting

17. Plain-English Reporting

18. Failure-Mode Block

19. Failure-Injection Recipes

20. State & Persistence

Cold-start recovery

21. Concurrency & Idempotency

22. Dependencies

Depends on (must run first)

Emits to (downstream consumers)

External services

23. Security Surfaces

Abuse vectors considered

Mitigations

24. Polymarket V2 Compatibility

API surfaces declared

Networks supported

25. Versioning & Migration

Migration history

26. Acceptance Tests

Unit Tests

Integration Tests