1. Bot Identity
| Layer | Risk Risk |
|---|
| Bot class | Guardrail |
|---|
| Authority | Veto |
|---|
| Status | PLANNED |
|---|
| Readiness | Planned |
|---|
| Runs before | ExecutionPlan emit |
|---|
| Runs after | Strategy OrderIntent |
|---|
| Applies to | Every OrderIntent from a model-driven strategy — detects when live strategy behaviour has drifted from the backtest distribution |
|---|
| Default mode | planned |
|---|
| User-visible | summary-only |
|---|
| Developer owner | Polytraders core — Risk pod |
|---|
Operational profile
| Modes supported | quarantine |
|---|
2. Purpose
ModelDriftMonitor flags strategies whose live behaviour has decoupled from their backtest distribution. It computes a drift score by comparing the rolling distribution of live signals or fill prices against the expected backtest baseline and hard-rejects new orders when drift exceeds the configured ceiling, preventing a degraded model from continuing to place orders.
3. Why This Bot Matters
Model degradation undetected
A strategy whose signal quality has degraded continues placing orders, accumulating losses without any automated circuit-breaker.
Market regime change
A model trained on one market regime may produce pathological signals in a new regime; model drift detection provides an early-warning gate before significant capital is deployed.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|
| max_drift_score | 0.25 | 0.15 | 0.25 | Maximum allowed drift score (KS statistic or similar) before new orders are blocked. |
| drift_lookback_n | 50 | None | None | Number of recent observations used to compute the live distribution. |
| drift_metric | ks_statistic | None | None | Statistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index). |
7. Detailed Parameter Instructions
max_drift_score
What it means
Maximum allowed drift score (KS statistic or similar) before new orders are blocked.
Default
{ "max_drift_score": 0.25 }
Why this default matters
A KS statistic of 0.25 indicates the live distribution has shifted significantly from the backtest; above this threshold the model is likely operating outside its trained regime.
Threshold logic
| Condition | Action |
|---|
| drift_score <= 0.15 | APPROVE |
| 0.15 < drift_score <= 0.25 | WARN — MODEL_DRIFT_WARN |
| drift_score > 0.25 | HARD_REJECT — MODEL_DRIFT_EXCEEDED |
Developer check
if (driftScore > params.max_drift_score) return reject('MODEL_DRIFT_EXCEEDED');
User-facing English
This strategy's live behaviour has diverged from its design parameters. New orders are paused.
drift_lookback_n
What it means
Number of recent observations used to compute the live distribution.
Default
{ "drift_lookback_n": 50 }
Why this default matters
50 observations provides a stable sample while remaining responsive to recent regime shifts.
Threshold logic
| Condition | Action |
|---|
| observations < drift_lookback_n | Skip drift check — insufficient data |
| observations >= drift_lookback_n | Compute drift score |
Developer check
if (liveObs.length < params.drift_lookback_n) return approve('MODEL_DRIFT_SKIPPED');
User-facing English
— not yet authored —
drift_metric
What it means
Statistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index).
Default
{ "drift_metric": "ks_statistic" }
Why this default matters
KS statistic is distribution-free and computationally cheap, making it suitable for real-time evaluation.
Threshold logic
| Condition | Action |
|---|
| metric=ks_statistic | Compare CDF of live vs backtest |
| metric=psi | Compute PSI bins |
Developer check
const driftScore = computeDrift(liveObs, backtestBaseline, params.drift_metric);
User-facing English
— not yet authored —
8. Default Configuration
{
"bot_id": "risk.model_drift_monitor",
"version": "0.1.0",
"mode": "hard_guard",
"defaults": {
"max_drift_score": 0.25,
"drift_lookback_n": 50,
"drift_metric": "ks_statistic"
},
"locked": {
"max_drift_score": {
"max": 0.5
}
}
}
9. Implementation Flow
- Receive OrderIntent from a model-driven strategy with strategy_id.
- Check KillSwitch; if active, HARD_REJECT(KILL_SWITCH_ACTIVE).
- Load backtest baseline distribution for strategy_id from internal store.
- If baseline unavailable, HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE).
- Fetch recent fill prices and signal values for strategy_id.
- If fewer than drift_lookback_n observations, APPROVE (skip — insufficient data).
- Compute drift_score using configured drift_metric (KS or PSI).
- If drift_score > max_drift_score, HARD_REJECT(MODEL_DRIFT_EXCEEDED).
- If drift_score > warning threshold, attach WARN annotation; APPROVE.
- APPROVE with drift_score attached to the RiskVote.
10. Reference Implementation
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.
FUNCTION evaluateModelDrift(intent):
ks = FETCH internal.killswitch.status
IF ks.active: EMIT RiskVote(HARD_REJECT, KILL_SWITCH_ACTIVE); RETURN
baseline = FETCH internal.backtest_baseline(intent.strategy_id)
IF baseline IS NULL:
EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_DATA_UNAVAILABLE); RETURN
liveObs = FETCH clob_auth.fill_prices(intent.strategy_id, n=params.drift_lookback_n)
IF len(liveObs) < params.drift_lookback_n:
EMIT RiskVote(APPROVE); RETURN // insufficient data — skip
driftScore = compute_ks(liveObs, baseline.distribution)
IF driftScore > params.max_drift_score:
EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_EXCEEDED,
drift_score=driftScore); RETURN
IF driftScore > params.max_drift_score * 0.6:
annotations.append(WARN(MODEL_DRIFT_WARN, drift_score=driftScore))
EMIT RiskVote(APPROVE, drift_score=driftScore)
SDK calls used
clob_auth.fill_prices(strategy_id, n)internal.backtest_baseline(strategy_id)internal.killswitch.status()
Complexity: O(N log N) for KS statistic where N = drift_lookback_n (max 50)
11. Wire Examples
Input — what arrives on the wire
OrderIntent — drift exceeded — internal
{
"intent_id": "int_d4e5f6a7b8c90004",
"strategy_id": "strat_002",
"size_usd": 200,
"generated_at_ms": 1746800000000
}
Output — what the bot emits
RiskVote — HARD_REJECT
{
"guard_id": "risk.model_drift_monitor",
"decision": "HARD_REJECT",
"severity": "HARD",
"reason_code": "MODEL_DRIFT_EXCEEDED",
"message": "Drift score 0.32 exceeds ceiling 0.25 for strategy strat_002.",
"constraints": {},
"checked_at": "2026-05-10T11:00:00Z"
}
12. Decision Logic
APPROVE
Drift score is within the warning threshold, or fewer than drift_lookback_n observations are available.
RESHAPE_REQUIRED
Not used; model drift is a strategy-level condition that cannot be addressed by resizing a single order.
REJECT
Drift score exceeds the hard ceiling, or backtest baseline is unavailable (fail-closed).
WARNING_ONLY
— not yet authored —13. Standard Decision Output
This bot returns a RiskVote object. See RiskVote schema.
{
"guard_id": "risk.model_drift_monitor",
"decision": "HARD_REJECT",
"severity": "HARD",
"reason_code": "MODEL_DRIFT_EXCEEDED",
"message": "Strategy strat_002 drift score 0.32 exceeds ceiling 0.25. New orders blocked.",
"constraints": {},
"inputs_used": [
"internal.backtest_baseline",
"clob_auth.fills"
],
"checked_at": "2026-05-10T11:00:00Z"
}
14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|
KILL_SWITCH_ACTIVE | HARD_REJECT | Global kill switch active. | Immediate HARD_REJECT. | Trading is paused. Please try again later. |
MODEL_DRIFT_EXCEEDED | HARD_REJECT | Strategy drift score exceeds the hard ceiling. | HARD_REJECT; log drift_score, strategy_id, and metric. | This strategy's behaviour has diverged from its design parameters. |
MODEL_DRIFT_WARN | WARN | Drift score is between warning and hard thresholds. | Attach WARN annotation; APPROVE. | |
MODEL_DRIFT_DATA_UNAVAILABLE | HARD_REJECT | Backtest baseline unavailable for this strategy. | HARD_REJECT (fail-closed). | Strategy baseline data is unavailable. Please try again. |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|
polytraders_risk_modeldriftmonitor_decisions_total | counter | count | decision, reason_code, strategy_id | Total decisions by type, reason, and strategy. |
polytraders_risk_modeldriftmonitor_drift_score | gauge | ratio | strategy_id, metric | Current drift score per strategy at last evaluation. |
polytraders_risk_modeldriftmonitor_eval_latency_ms | histogram | milliseconds | | Latency from intent to RiskVote emit. |
Alerts
| Alert | Condition | Severity | Runbook |
|---|
ModelDriftMonitorDriftDetected | rate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_EXCEEDED'}[5m]) > 0 | P2 | #runbook-modeldrift-detected |
ModelDriftMonitorDataUnavailable | rate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_DATA_UNAVAILABLE'}[5m]) > 0 | P1 | #runbook-modeldrift-data |
16. Developer Reporting
{
"bot_id": "risk.model_drift_monitor",
"decision": "HARD_REJECT",
"reason_code": "MODEL_DRIFT_EXCEEDED",
"inputs_used": [
"internal.backtest_baseline",
"clob_auth.fills"
],
"metrics": {
"strategy_id": "strat_002",
"drift_score": 0.32,
"drift_metric": "ks_statistic",
"lookback_n": 50,
"ceiling": 0.25
},
"checked_at": "2026-05-10T11:00:00Z"
}
17. Plain-English Reporting
| Situation | User-facing explanation |
|---|
| Order blocked — model drift | This strategy's behaviour has diverged from its design baseline. New orders are paused while the strategy is reviewed. |
| Warning — drift approaching limit | This strategy is showing signs of behavioural drift. Consider reviewing recent fills before increasing position size. |
18. Failure-Mode Block
| main_failure_mode | Failing to detect drift because the backtest baseline is outdated and the live distribution matches a new, equally invalid pattern. |
|---|
| false_positive_risk | Blocking a legitimately adapting strategy because the baseline has not been updated to reflect an intentional strategy improvement. |
|---|
| false_negative_risk | Approving orders during early-stage drift if the lookback window is too long to detect a fast regime change. |
|---|
| safe_fallback | If backtest baseline data is unavailable, HARD_REJECT with MODEL_DRIFT_DATA_UNAVAILABLE. Never approve when the baseline cannot be loaded. |
|---|
| required_dependencies | Strategy backtest baseline store, CLOB fill history, KillSwitch active flag |
|---|
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|
BASELINE_UNAVAILABLE | Delete backtest baseline from Redis for strategy_id | | Returns to normal within one baseline cache refresh after baseline is restored. |
DRIFT_SPIKE | Inject fill prices with distribution KS distance 0.40 from baseline | | Returns to APPROVE once live fills converge back toward the baseline distribution. |
INSUFFICIENT_OBSERVATIONS | Clear fill history for strategy_id, set lookback_n=50 | | Check activates after drift_lookback_n fills have been accumulated. |
20. State & Persistence
Cold-start recovery
Baseline reloaded from Redis on cold start. If unavailable, HARD_REJECT until restored.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|
| Execution model | single-threaded event loop |
| Max in-flight | 100 |
| Idempotency key | intent_id |
| Per-call timeout (ms) | 150 |
| Backpressure strategy | drop newest |
| Locking / mutual exclusion | per-strategy_id mutex during drift score computation |
22. Dependencies
Depends on (must run first)
| Bot | Why | Contract |
|---|
| risk.kill_switch | Global brake checked first. | HARD_REJECT(KILL_SWITCH_ACTIVE) short-circuits all evaluation. |
Emits to (downstream consumers)
External services
| Service | Endpoint | SLA assumed | On failure |
|---|
| Data API (fill history) | https://data-api.polymarket.com | 99.9% / 500ms p99 | HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE) if fill history unavailable. |
23. Security Surfaces
Abuse vectors considered
- Injecting synthetic fill data to deflate the drift score
- Bypassing drift check by submitting from a strategy_id with no backtest baseline
Mitigations
- Fill data sourced exclusively from CLOB authenticated endpoint with provenance timestamp
- Missing baseline triggers HARD_REJECT (fail-closed); unknown strategy_id is treated as missing baseline
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|
| CLOB version | v2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | no |
| Aware of negative-risk markets | no |
| Multi-chain ready | no |
| SDK used | py-clob-client-v2 |
| Settlement contract | CTFExchangeV2 |
| Notes | Fill prices and signal values are denominated in pUSD. Uses CLOB V2 authenticated fill history endpoint. |
25. Versioning & Migration
| Field | Value |
|---|
| spec | 2.0.0 |
| implementation | 0.1.0 |
| schema | 2 |
| released | None |
| planned_release | Q4-2026 |
Migration history
| Date | From | To | Reason | Action taken |
|---|
| 2026-04-28 | n/a | v2-spec | Spec drafted post-CLOB-V2 cutover; bot not yet implemented | Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain) |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|
| Approve when drift score within warning threshold | drift_score=0.10, ceiling=0.25 | APPROVE |
| Warn when drift between warning and hard | drift_score=0.20, warning=0.15, hard=0.25 | APPROVE with WARN annotation |
| Reject when drift exceeds ceiling | drift_score=0.32, ceiling=0.25 | HARD_REJECT(MODEL_DRIFT_EXCEEDED) |
| Skip when insufficient observations | observations=30, lookback_n=50 | APPROVE (check skipped) |
Integration Tests
| Test | Expected result |
|---|
| Drift detected after regime change in live fills | HARD_REJECT(MODEL_DRIFT_EXCEEDED) within one evaluation cycle of drift_score exceeding ceiling |
| KillSwitch bypasses drift check | HARD_REJECT(KILL_SWITCH_ACTIVE) without computing drift score |
Property Tests
| Property | Required behaviour |
|---|
| Drift score above hard ceiling never results in APPROVE | Always true |
| Missing baseline always results in HARD_REJECT | Always true — fail-closed on missing baseline |
27. Operational Runbook
ModelDriftMonitor incidents typically indicate a genuine strategy degradation or a regime change. Verify whether the drift is expected before unlocking the strategy.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|
ModelDriftMonitorDriftDetected | Inspect drift_score gauge per strategy_id; compare live fills to the backtest baseline to determine if drift is genuine or a data artefact. | | | Risk pod lead; strategy may need baseline recalibration before re-enabling. |
ModelDriftMonitorDataUnavailable | Check Redis baseline store and CLOB fill history endpoint. | | | Risk pod lead if sustained > 2 minutes. |
Manual overrides
polytraders risk update-baseline --strategy-id <id> --source recent_fills — After a confirmed intentional strategy update that changes the expected fill distribution.
Healthcheck
GET /internal/health/modeldriftmonitor → green: Baselines loaded for all active strategies, fill history accessible, p99 eval latency < 150ms; red: Any strategy baseline missing, fill history unavailable, or HARD_REJECT rate > 0.1
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |