Home › By Layer › Risk › 1.12 ModelDriftMonitor

1.12 ModelDriftMonitor

Risk Guardrail Veto PLANNED Planned capital · Direct P4 · Core risk ○ pending flagship stub

ModelDriftMonitor flags strategies whose live behaviour has decoupled from their backtest distribution. It computes a drift score by comparing the rolling distribution of live signals or fill prices against the expected backtest baseline and hard-rejects new orders when drift exceeds the configured ceiling, preventing a degraded model from continuing to place orders.

v3 readiness

Docs27/27

donehow scored

Impl0/15

pendinghow scored

Backtest0/4

pendinghow scored

Runtime0/8

pendinghow scored

A bot is done when all four scores are. What does done mean?

← 1.11 CorrelationShockGuard 1.13 TailLossSimulator →

1. Bot Identity

Layer	Risk Risk
Bot class	Guardrail
Authority	Veto
Status	PLANNED
Readiness	Planned
Runs before	ExecutionPlan emit
Runs after	Strategy OrderIntent
Applies to	Every OrderIntent from a model-driven strategy — detects when live strategy behaviour has drifted from the backtest distribution
Default mode	`planned`
User-visible	summary-only
Developer owner	Polytraders core — Risk pod

Operational profile

Modes supported	quarantine

2. Purpose

3. Why This Bot Matters

Model degradation undetected
A strategy whose signal quality has degraded continues placing orders, accumulating losses without any automated circuit-breaker.
Market regime change
A model trained on one market regime may produce pathological signals in a new regime; model drift detection provides an early-warning gate before significant capital is deployed.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

Input	Source	Required?	Use
Recent fill prices and strategy signal values	`clob_auth`	Yes	Compute rolling live distribution for comparison against backtest baseline.
Market metadata (category, volume, resolution type)	`gamma`	No	Provide context for regime classification to distinguish genuine drift from expected regime variation.

5. Required Internal Inputs

Input	Source	Required?	Use
Strategy backtest distribution summary (mean, std, percentiles)	`internal`	Yes	Define the expected baseline distribution against which live behaviour is compared.
KillSwitch active flag	`KillSwitch`	Yes	If active, reject immediately.

6. Parameter Guide

Parameter	Default	Warning	Hard	What it controls
max_drift_score	`0.25`	`0.15`	`0.25`	Maximum allowed drift score (KS statistic or similar) before new orders are blocked.
drift_lookback_n	`50`	`None`	`None`	Number of recent observations used to compute the live distribution.
drift_metric	`ks_statistic`	`None`	`None`	Statistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index).

7. Detailed Parameter Instructions

max_drift_score

What it means

Maximum allowed drift score (KS statistic or similar) before new orders are blocked.

Default

{ "max_drift_score": 0.25 }

Why this default matters

A KS statistic of 0.25 indicates the live distribution has shifted significantly from the backtest; above this threshold the model is likely operating outside its trained regime.

Threshold logic

Condition	Action
drift_score <= 0.15	APPROVE
0.15 < drift_score <= 0.25	WARN — MODEL_DRIFT_WARN
drift_score > 0.25	HARD_REJECT — MODEL_DRIFT_EXCEEDED

Developer check

if (driftScore > params.max_drift_score) return reject('MODEL_DRIFT_EXCEEDED');

User-facing English

This strategy's live behaviour has diverged from its design parameters. New orders are paused.

drift_lookback_n

What it means

Number of recent observations used to compute the live distribution.

Default

{ "drift_lookback_n": 50 }

Why this default matters

50 observations provides a stable sample while remaining responsive to recent regime shifts.

Threshold logic

Condition	Action
observations < drift_lookback_n	Skip drift check — insufficient data
observations >= drift_lookback_n	Compute drift score

Developer check

if (liveObs.length < params.drift_lookback_n) return approve('MODEL_DRIFT_SKIPPED');

User-facing English

— not yet authored —

drift_metric

What it means

Statistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index).

Default

{ "drift_metric": "ks_statistic" }

Why this default matters

KS statistic is distribution-free and computationally cheap, making it suitable for real-time evaluation.

Threshold logic

Condition	Action
metric=ks_statistic	Compare CDF of live vs backtest
metric=psi	Compute PSI bins

Developer check

const driftScore = computeDrift(liveObs, backtestBaseline, params.drift_metric);

User-facing English

— not yet authored —

8. Default Configuration

{
  "bot_id": "risk.model_drift_monitor",
  "version": "0.1.0",
  "mode": "hard_guard",
  "defaults": {
    "max_drift_score": 0.25,
    "drift_lookback_n": 50,
    "drift_metric": "ks_statistic"
  },
  "locked": {
    "max_drift_score": {
      "max": 0.5
    }
  }
}

9. Implementation Flow

Receive OrderIntent from a model-driven strategy with strategy_id.
Check KillSwitch; if active, HARD_REJECT(KILL_SWITCH_ACTIVE).
Load backtest baseline distribution for strategy_id from internal store.
If baseline unavailable, HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE).
Fetch recent fill prices and signal values for strategy_id.
If fewer than drift_lookback_n observations, APPROVE (skip — insufficient data).
Compute drift_score using configured drift_metric (KS or PSI).
If drift_score > max_drift_score, HARD_REJECT(MODEL_DRIFT_EXCEEDED).
If drift_score > warning threshold, attach WARN annotation; APPROVE.
APPROVE with drift_score attached to the RiskVote.

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

FUNCTION evaluateModelDrift(intent):
  ks = FETCH internal.killswitch.status
  IF ks.active: EMIT RiskVote(HARD_REJECT, KILL_SWITCH_ACTIVE); RETURN

  baseline = FETCH internal.backtest_baseline(intent.strategy_id)
  IF baseline IS NULL:
    EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_DATA_UNAVAILABLE); RETURN

  liveObs = FETCH clob_auth.fill_prices(intent.strategy_id, n=params.drift_lookback_n)
  IF len(liveObs) < params.drift_lookback_n:
    EMIT RiskVote(APPROVE); RETURN  // insufficient data — skip

  driftScore = compute_ks(liveObs, baseline.distribution)

  IF driftScore > params.max_drift_score:
    EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_EXCEEDED,
                  drift_score=driftScore); RETURN
  IF driftScore > params.max_drift_score * 0.6:
    annotations.append(WARN(MODEL_DRIFT_WARN, drift_score=driftScore))

  EMIT RiskVote(APPROVE, drift_score=driftScore)

SDK calls used

clob_auth.fill_prices(strategy_id, n)
internal.backtest_baseline(strategy_id)
internal.killswitch.status()

Complexity: O(N log N) for KS statistic where N = drift_lookback_n (max 50)

11. Wire Examples

Input — what arrives on the wire

OrderIntent — drift exceeded — internal

{
  "intent_id": "int_d4e5f6a7b8c90004",
  "strategy_id": "strat_002",
  "size_usd": 200,
  "generated_at_ms": 1746800000000
}

Output — what the bot emits

RiskVote — HARD_REJECT

{
  "guard_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "severity": "HARD",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "message": "Drift score 0.32 exceeds ceiling 0.25 for strategy strat_002.",
  "constraints": {},
  "checked_at": "2026-05-10T11:00:00Z"
}

12. Decision Logic

APPROVE

Drift score is within the warning threshold, or fewer than drift_lookback_n observations are available.

RESHAPE_REQUIRED

Not used; model drift is a strategy-level condition that cannot be addressed by resizing a single order.

REJECT

Drift score exceeds the hard ceiling, or backtest baseline is unavailable (fail-closed).

WARNING_ONLY

— not yet authored —

13. Standard Decision Output

This bot returns a RiskVote object. See RiskVote schema.

{
  "guard_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "severity": "HARD",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "message": "Strategy strat_002 drift score 0.32 exceeds ceiling 0.25. New orders blocked.",
  "constraints": {},
  "inputs_used": [
    "internal.backtest_baseline",
    "clob_auth.fills"
  ],
  "checked_at": "2026-05-10T11:00:00Z"
}

14. Reason Codes

Code	Severity	Meaning	Action	User-facing message
`KILL_SWITCH_ACTIVE`	HARD_REJECT	Global kill switch active.	Immediate HARD_REJECT.	Trading is paused. Please try again later.
`MODEL_DRIFT_EXCEEDED`	HARD_REJECT	Strategy drift score exceeds the hard ceiling.	HARD_REJECT; log drift_score, strategy_id, and metric.	This strategy's behaviour has diverged from its design parameters.
`MODEL_DRIFT_WARN`	WARN	Drift score is between warning and hard thresholds.	Attach WARN annotation; APPROVE.
`MODEL_DRIFT_DATA_UNAVAILABLE`	HARD_REJECT	Backtest baseline unavailable for this strategy.	HARD_REJECT (fail-closed).	Strategy baseline data is unavailable. Please try again.

15. Metrics & Logs

Metrics emitted

Metric	Type	Unit	Labels	Meaning
`polytraders_risk_modeldriftmonitor_decisions_total`	counter	count	decision, reason_code, strategy_id	Total decisions by type, reason, and strategy.
`polytraders_risk_modeldriftmonitor_drift_score`	gauge	ratio	strategy_id, metric	Current drift score per strategy at last evaluation.
`polytraders_risk_modeldriftmonitor_eval_latency_ms`	histogram	milliseconds		Latency from intent to RiskVote emit.

Alerts

Alert	Condition	Severity	Runbook
`ModelDriftMonitorDriftDetected`	`rate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_EXCEEDED'}[5m]) > 0`	P2	#runbook-modeldrift-detected
`ModelDriftMonitorDataUnavailable`	`rate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_DATA_UNAVAILABLE'}[5m]) > 0`	P1	#runbook-modeldrift-data

16. Developer Reporting

{
  "bot_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "inputs_used": [
    "internal.backtest_baseline",
    "clob_auth.fills"
  ],
  "metrics": {
    "strategy_id": "strat_002",
    "drift_score": 0.32,
    "drift_metric": "ks_statistic",
    "lookback_n": 50,
    "ceiling": 0.25
  },
  "checked_at": "2026-05-10T11:00:00Z"
}

17. Plain-English Reporting

Situation	User-facing explanation
Order blocked — model drift	This strategy's behaviour has diverged from its design baseline. New orders are paused while the strategy is reviewed.
Warning — drift approaching limit	This strategy is showing signs of behavioural drift. Consider reviewing recent fills before increasing position size.

18. Failure-Mode Block

main_failure_mode	Failing to detect drift because the backtest baseline is outdated and the live distribution matches a new, equally invalid pattern.
false_positive_risk	Blocking a legitimately adapting strategy because the baseline has not been updated to reflect an intentional strategy improvement.
false_negative_risk	Approving orders during early-stage drift if the lookback window is too long to detect a fast regime change.
safe_fallback	If backtest baseline data is unavailable, HARD_REJECT with MODEL_DRIFT_DATA_UNAVAILABLE. Never approve when the baseline cannot be loaded.
required_dependencies	Strategy backtest baseline store, CLOB fill history, KillSwitch active flag

19. Failure-Injection Recipes

Scenario	How to inject	Recovery
`BASELINE_UNAVAILABLE`	Delete backtest baseline from Redis for strategy_id	Returns to normal within one baseline cache refresh after baseline is restored.
`DRIFT_SPIKE`	Inject fill prices with distribution KS distance 0.40 from baseline	Returns to APPROVE once live fills converge back toward the baseline distribution.
`INSUFFICIENT_OBSERVATIONS`	Clear fill history for strategy_id, set lookback_n=50	Check activates after drift_lookback_n fills have been accumulated.

20. State & Persistence

Cold-start recovery

Baseline reloaded from Redis on cold start. If unavailable, HARD_REJECT until restored.

21. Concurrency & Idempotency

Aspect	Specification
Execution model	`single-threaded event loop`
Max in-flight	`100`
Idempotency key	`intent_id`
Per-call timeout (ms)	`150`
Backpressure strategy	`drop newest`
Locking / mutual exclusion	`per-strategy_id mutex during drift score computation`

22. Dependencies

Depends on (must run first)

Bot	Why	Contract
risk.kill_switch	Global brake checked first.	HARD_REJECT(KILL_SWITCH_ACTIVE) short-circuits all evaluation.

Emits to (downstream consumers)

Bot	Why	Contract
exec.smart_router	Approved RiskVote passes to SmartRouter.	APPROVE passes; HARD_REJECT discards intent.

External services

Service	Endpoint	SLA assumed	On failure
Data API (fill history)	https://data-api.polymarket.com	99.9% / 500ms p99	HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE) if fill history unavailable.

23. Security Surfaces

Abuse vectors considered

Injecting synthetic fill data to deflate the drift score
Bypassing drift check by submitting from a strategy_id with no backtest baseline

Mitigations

Fill data sourced exclusively from CLOB authenticated endpoint with provenance timestamp
Missing baseline triggers HARD_REJECT (fail-closed); unknown strategy_id is treated as missing baseline

24. Polymarket V2 Compatibility

Aspect	Value
CLOB version	`v2`
Collateral asset	`pUSD`
EIP-712 Exchange domain version	`2`
Aware of builderCode field	no
Aware of negative-risk markets	no
Multi-chain ready	no
SDK used	`py-clob-client-v2`
Settlement contract	`CTFExchangeV2`
Notes	`Fill prices and signal values are denominated in pUSD. Uses CLOB V2 authenticated fill history endpoint.`

25. Versioning & Migration

Field	Value
spec	`2.0.0`
implementation	`0.1.0`
schema	`2`
released	`None`
planned_release	`Q4-2026`

Migration history

Date	From	To	Reason	Action taken
2026-04-28	n/a	v2-spec	Spec drafted post-CLOB-V2 cutover; bot not yet implemented	Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain)

26. Acceptance Tests

Unit Tests

Test	Setup	Expected result
Approve when drift score within warning threshold	drift_score=0.10, ceiling=0.25	APPROVE
Warn when drift between warning and hard	drift_score=0.20, warning=0.15, hard=0.25	APPROVE with WARN annotation
Reject when drift exceeds ceiling	drift_score=0.32, ceiling=0.25	HARD_REJECT(MODEL_DRIFT_EXCEEDED)
Skip when insufficient observations	observations=30, lookback_n=50	APPROVE (check skipped)

Integration Tests

Test	Expected result
Drift detected after regime change in live fills	HARD_REJECT(MODEL_DRIFT_EXCEEDED) within one evaluation cycle of drift_score exceeding ceiling
KillSwitch bypasses drift check	HARD_REJECT(KILL_SWITCH_ACTIVE) without computing drift score

Property Tests

Property	Required behaviour
Drift score above hard ceiling never results in APPROVE	Always true
Missing baseline always results in HARD_REJECT	Always true — fail-closed on missing baseline

27. Operational Runbook

ModelDriftMonitor incidents typically indicate a genuine strategy degradation or a regime change. Verify whether the drift is expected before unlocking the strategy.

On-call actions

Alert	First step	Diagnosis	Mitigation	Escalate to
`ModelDriftMonitorDriftDetected`	Inspect drift_score gauge per strategy_id; compare live fills to the backtest baseline to determine if drift is genuine or a data artefact.			Risk pod lead; strategy may need baseline recalibration before re-enabling.
`ModelDriftMonitorDataUnavailable`	Check Redis baseline store and CLOB fill history endpoint.			Risk pod lead if sustained > 2 minutes.

Manual overrides

polytraders risk update-baseline --strategy-id <id> --source recent_fills — After a confirmed intentional strategy update that changes the expected fill distribution.

Healthcheck

GET /internal/health/modeldriftmonitor → green: Baselines loaded for all active strategies, fill history accessible, p99 eval latency < 150ms; red: Any strategy baseline missing, fill history unavailable, or HARD_REJECT rate > 0.1

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

Gate	How measured	Threshold
Unit tests pass for drift spike and skip scenarios	CI test run	100% pass

Promote to Limited live

Gate	How measured	Threshold
Shadow drift scores align with expected baseline for active strategies over 48h	Grafana drift_score gauge comparison	No spurious HARD_REJECTs in shadow run

Promote to General live

Gate	How measured	Threshold
Baseline update workflow tested and documented	Manual E2E test of update_baseline command	Pass

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

Requirement	Status
Purpose defined	✓ done
Required inputs listed	✓ done
Parameters defined	✓ done
Defaults defined	✓ done
Warning thresholds defined	✓ done
Hard thresholds defined	✓ done
Safe fallback defined	✓ done
Structured output defined	✓ done
Developer log defined	✓ done
Plain-English explanation	✓ done
Unit tests defined	✓ done
Integration tests defined	✓ done
Property tests defined	✓ done
Failure-mode block complete	✓ done
Reference implementation pseudocode	✓ done
Wire examples (input + output)	✓ done
Reason codes listed	✓ done
Metrics & logs defined	✓ done
State & persistence defined	✓ done
Concurrency & idempotency defined	✓ done
Dependencies declared	✓ done
Security surfaces declared	✓ done
Polymarket V2 compatibility declared	✓ done
Version & migration history declared	✓ done
Operational runbook defined	✓ done
Promotion gates defined	✓ done
Failure-injection recipes defined	✓ done

1.12 ModelDriftMonitor

v3 readiness

1. Bot Identity

Operational profile

2. Purpose

3. Why This Bot Matters

Model degradation undetected

Market regime change

4. Required Polymarket Inputs

5. Required Internal Inputs

6. Parameter Guide

7. Detailed Parameter Instructions

max_drift_score

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

drift_lookback_n

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

drift_metric

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

8. Default Configuration

9. Implementation Flow

10. Reference Implementation

SDK calls used

11. Wire Examples

Input — what arrives on the wire

Output — what the bot emits

12. Decision Logic

APPROVE

RESHAPE_REQUIRED

REJECT

WARNING_ONLY

13. Standard Decision Output

14. Reason Codes

15. Metrics & Logs

Metrics emitted

Alerts

16. Developer Reporting

17. Plain-English Reporting

18. Failure-Mode Block

19. Failure-Injection Recipes

20. State & Persistence

Cold-start recovery

21. Concurrency & Idempotency

22. Dependencies

Depends on (must run first)

Emits to (downstream consumers)

External services

23. Security Surfaces

Abuse vectors considered

Mitigations

24. Polymarket V2 Compatibility

25. Versioning & Migration

Migration history

26. Acceptance Tests

Unit Tests

Integration Tests

Property Tests

27. Operational Runbook

On-call actions

Manual overrides

Healthcheck

28. Promotion Gates

Promote to Shadow

Promote to Limited live

Promote to General live

29. Developer Checklist