Polytraders Dev Guide
internal
v3 spine Phase 1 · Shared contracts 9 demo-wired · 0 shadow-ready · 0 production-live · 100 pending · 109 total 15/33 infra tasks the plan status board
HomeBy LayerRisk1.12 ModelDriftMonitor

1.12 ModelDriftMonitor

Risk Guardrail Veto PLANNED Planned capital · Direct P4 · Core risk pending flagship stub

ModelDriftMonitor flags strategies whose live behaviour has decoupled from their backtest distribution. It computes a drift score by comparing the rolling distribution of live signals or fill prices against the expected backtest baseline and hard-rejects new orders when drift exceeds the configured ceiling, preventing a degraded model from continuing to place orders.

v3 readiness

Docs27/27
donehow scored
Impl0/15
pendinghow scored
Backtest0/4
pendinghow scored
Runtime0/8
pendinghow scored

A bot is done when all four scores are. What does done mean?

1. Bot Identity

LayerRisk  Risk
Bot classGuardrail
AuthorityVeto
StatusPLANNED
ReadinessPlanned
Runs beforeExecutionPlan emit
Runs afterStrategy OrderIntent
Applies toEvery OrderIntent from a model-driven strategy — detects when live strategy behaviour has drifted from the backtest distribution
Default modeplanned
User-visiblesummary-only
Developer ownerPolytraders core — Risk pod

Operational profile

Modes supportedquarantine

2. Purpose

ModelDriftMonitor flags strategies whose live behaviour has decoupled from their backtest distribution. It computes a drift score by comparing the rolling distribution of live signals or fill prices against the expected backtest baseline and hard-rejects new orders when drift exceeds the configured ceiling, preventing a degraded model from continuing to place orders.

3. Why This Bot Matters

  • Model degradation undetected

    A strategy whose signal quality has degraded continues placing orders, accumulating losses without any automated circuit-breaker.

  • Market regime change

    A model trained on one market regime may produce pathological signals in a new regime; model drift detection provides an early-warning gate before significant capital is deployed.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

InputSourceRequired?Use
Recent fill prices and strategy signal valuesclob_authYesCompute rolling live distribution for comparison against backtest baseline.
Market metadata (category, volume, resolution type)gammaNoProvide context for regime classification to distinguish genuine drift from expected regime variation.

5. Required Internal Inputs

InputSourceRequired?Use
Strategy backtest distribution summary (mean, std, percentiles)internalYesDefine the expected baseline distribution against which live behaviour is compared.
KillSwitch active flagKillSwitchYesIf active, reject immediately.

6. Parameter Guide

ParameterDefaultWarningHardWhat it controls
max_drift_score0.250.150.25Maximum allowed drift score (KS statistic or similar) before new orders are blocked.
drift_lookback_n50NoneNoneNumber of recent observations used to compute the live distribution.
drift_metricks_statisticNoneNoneStatistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index).

7. Detailed Parameter Instructions

max_drift_score

What it means

Maximum allowed drift score (KS statistic or similar) before new orders are blocked.

Default

{ "max_drift_score": 0.25 }

Why this default matters

A KS statistic of 0.25 indicates the live distribution has shifted significantly from the backtest; above this threshold the model is likely operating outside its trained regime.

Threshold logic

ConditionAction
drift_score <= 0.15APPROVE
0.15 < drift_score <= 0.25WARN — MODEL_DRIFT_WARN
drift_score > 0.25HARD_REJECT — MODEL_DRIFT_EXCEEDED

Developer check

if (driftScore > params.max_drift_score) return reject('MODEL_DRIFT_EXCEEDED');

User-facing English

This strategy's live behaviour has diverged from its design parameters. New orders are paused.

drift_lookback_n

What it means

Number of recent observations used to compute the live distribution.

Default

{ "drift_lookback_n": 50 }

Why this default matters

50 observations provides a stable sample while remaining responsive to recent regime shifts.

Threshold logic

ConditionAction
observations < drift_lookback_nSkip drift check — insufficient data
observations >= drift_lookback_nCompute drift score

Developer check

if (liveObs.length < params.drift_lookback_n) return approve('MODEL_DRIFT_SKIPPED');

User-facing English

— not yet authored —

drift_metric

What it means

Statistical metric used to measure distribution drift. Supported: ks_statistic (Kolmogorov–Smirnov), psi (Population Stability Index).

Default

{ "drift_metric": "ks_statistic" }

Why this default matters

KS statistic is distribution-free and computationally cheap, making it suitable for real-time evaluation.

Threshold logic

ConditionAction
metric=ks_statisticCompare CDF of live vs backtest
metric=psiCompute PSI bins

Developer check

const driftScore = computeDrift(liveObs, backtestBaseline, params.drift_metric);

User-facing English

— not yet authored —

8. Default Configuration

{
  "bot_id": "risk.model_drift_monitor",
  "version": "0.1.0",
  "mode": "hard_guard",
  "defaults": {
    "max_drift_score": 0.25,
    "drift_lookback_n": 50,
    "drift_metric": "ks_statistic"
  },
  "locked": {
    "max_drift_score": {
      "max": 0.5
    }
  }
}

9. Implementation Flow

  1. Receive OrderIntent from a model-driven strategy with strategy_id.
  2. Check KillSwitch; if active, HARD_REJECT(KILL_SWITCH_ACTIVE).
  3. Load backtest baseline distribution for strategy_id from internal store.
  4. If baseline unavailable, HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE).
  5. Fetch recent fill prices and signal values for strategy_id.
  6. If fewer than drift_lookback_n observations, APPROVE (skip — insufficient data).
  7. Compute drift_score using configured drift_metric (KS or PSI).
  8. If drift_score > max_drift_score, HARD_REJECT(MODEL_DRIFT_EXCEEDED).
  9. If drift_score > warning threshold, attach WARN annotation; APPROVE.
  10. APPROVE with drift_score attached to the RiskVote.

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

FUNCTION evaluateModelDrift(intent):
  ks = FETCH internal.killswitch.status
  IF ks.active: EMIT RiskVote(HARD_REJECT, KILL_SWITCH_ACTIVE); RETURN

  baseline = FETCH internal.backtest_baseline(intent.strategy_id)
  IF baseline IS NULL:
    EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_DATA_UNAVAILABLE); RETURN

  liveObs = FETCH clob_auth.fill_prices(intent.strategy_id, n=params.drift_lookback_n)
  IF len(liveObs) < params.drift_lookback_n:
    EMIT RiskVote(APPROVE); RETURN  // insufficient data — skip

  driftScore = compute_ks(liveObs, baseline.distribution)

  IF driftScore > params.max_drift_score:
    EMIT RiskVote(HARD_REJECT, MODEL_DRIFT_EXCEEDED,
                  drift_score=driftScore); RETURN
  IF driftScore > params.max_drift_score * 0.6:
    annotations.append(WARN(MODEL_DRIFT_WARN, drift_score=driftScore))

  EMIT RiskVote(APPROVE, drift_score=driftScore)

SDK calls used

  • clob_auth.fill_prices(strategy_id, n)
  • internal.backtest_baseline(strategy_id)
  • internal.killswitch.status()

Complexity: O(N log N) for KS statistic where N = drift_lookback_n (max 50)

11. Wire Examples

Input — what arrives on the wire

OrderIntent — drift exceededinternal

{
  "intent_id": "int_d4e5f6a7b8c90004",
  "strategy_id": "strat_002",
  "size_usd": 200,
  "generated_at_ms": 1746800000000
}

Output — what the bot emits

RiskVote — HARD_REJECT

{
  "guard_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "severity": "HARD",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "message": "Drift score 0.32 exceeds ceiling 0.25 for strategy strat_002.",
  "constraints": {},
  "checked_at": "2026-05-10T11:00:00Z"
}

12. Decision Logic

APPROVE

Drift score is within the warning threshold, or fewer than drift_lookback_n observations are available.

RESHAPE_REQUIRED

Not used; model drift is a strategy-level condition that cannot be addressed by resizing a single order.

REJECT

Drift score exceeds the hard ceiling, or backtest baseline is unavailable (fail-closed).

WARNING_ONLY

— not yet authored —

13. Standard Decision Output

This bot returns a RiskVote object. See RiskVote schema.

{
  "guard_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "severity": "HARD",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "message": "Strategy strat_002 drift score 0.32 exceeds ceiling 0.25. New orders blocked.",
  "constraints": {},
  "inputs_used": [
    "internal.backtest_baseline",
    "clob_auth.fills"
  ],
  "checked_at": "2026-05-10T11:00:00Z"
}

14. Reason Codes

CodeSeverityMeaningActionUser-facing message
KILL_SWITCH_ACTIVEHARD_REJECTGlobal kill switch active.Immediate HARD_REJECT.Trading is paused. Please try again later.
MODEL_DRIFT_EXCEEDEDHARD_REJECTStrategy drift score exceeds the hard ceiling.HARD_REJECT; log drift_score, strategy_id, and metric.This strategy's behaviour has diverged from its design parameters.
MODEL_DRIFT_WARNWARNDrift score is between warning and hard thresholds.Attach WARN annotation; APPROVE.
MODEL_DRIFT_DATA_UNAVAILABLEHARD_REJECTBacktest baseline unavailable for this strategy.HARD_REJECT (fail-closed).Strategy baseline data is unavailable. Please try again.

15. Metrics & Logs

Metrics emitted

MetricTypeUnitLabelsMeaning
polytraders_risk_modeldriftmonitor_decisions_totalcountercountdecision, reason_code, strategy_idTotal decisions by type, reason, and strategy.
polytraders_risk_modeldriftmonitor_drift_scoregaugeratiostrategy_id, metricCurrent drift score per strategy at last evaluation.
polytraders_risk_modeldriftmonitor_eval_latency_mshistogrammillisecondsLatency from intent to RiskVote emit.

Alerts

AlertConditionSeverityRunbook
ModelDriftMonitorDriftDetectedrate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_EXCEEDED'}[5m]) > 0P2#runbook-modeldrift-detected
ModelDriftMonitorDataUnavailablerate(polytraders_risk_modeldriftmonitor_decisions_total{reason_code='MODEL_DRIFT_DATA_UNAVAILABLE'}[5m]) > 0P1#runbook-modeldrift-data

16. Developer Reporting

{
  "bot_id": "risk.model_drift_monitor",
  "decision": "HARD_REJECT",
  "reason_code": "MODEL_DRIFT_EXCEEDED",
  "inputs_used": [
    "internal.backtest_baseline",
    "clob_auth.fills"
  ],
  "metrics": {
    "strategy_id": "strat_002",
    "drift_score": 0.32,
    "drift_metric": "ks_statistic",
    "lookback_n": 50,
    "ceiling": 0.25
  },
  "checked_at": "2026-05-10T11:00:00Z"
}

17. Plain-English Reporting

SituationUser-facing explanation
Order blocked — model driftThis strategy's behaviour has diverged from its design baseline. New orders are paused while the strategy is reviewed.
Warning — drift approaching limitThis strategy is showing signs of behavioural drift. Consider reviewing recent fills before increasing position size.

18. Failure-Mode Block

main_failure_modeFailing to detect drift because the backtest baseline is outdated and the live distribution matches a new, equally invalid pattern.
false_positive_riskBlocking a legitimately adapting strategy because the baseline has not been updated to reflect an intentional strategy improvement.
false_negative_riskApproving orders during early-stage drift if the lookback window is too long to detect a fast regime change.
safe_fallbackIf backtest baseline data is unavailable, HARD_REJECT with MODEL_DRIFT_DATA_UNAVAILABLE. Never approve when the baseline cannot be loaded.
required_dependenciesStrategy backtest baseline store, CLOB fill history, KillSwitch active flag

19. Failure-Injection Recipes

ScenarioHow to injectExpected behaviourRecovery
BASELINE_UNAVAILABLEDelete backtest baseline from Redis for strategy_idReturns to normal within one baseline cache refresh after baseline is restored.
DRIFT_SPIKEInject fill prices with distribution KS distance 0.40 from baselineReturns to APPROVE once live fills converge back toward the baseline distribution.
INSUFFICIENT_OBSERVATIONSClear fill history for strategy_id, set lookback_n=50Check activates after drift_lookback_n fills have been accumulated.

20. State & Persistence

Cold-start recovery

Baseline reloaded from Redis on cold start. If unavailable, HARD_REJECT until restored.

21. Concurrency & Idempotency

AspectSpecification
Execution modelsingle-threaded event loop
Max in-flight100
Idempotency keyintent_id
Per-call timeout (ms)150
Backpressure strategydrop newest
Locking / mutual exclusionper-strategy_id mutex during drift score computation

22. Dependencies

Depends on (must run first)

BotWhyContract
risk.kill_switchGlobal brake checked first.HARD_REJECT(KILL_SWITCH_ACTIVE) short-circuits all evaluation.

Emits to (downstream consumers)

BotWhyContract
exec.smart_routerApproved RiskVote passes to SmartRouter.APPROVE passes; HARD_REJECT discards intent.

External services

ServiceEndpointSLA assumedOn failure
Data API (fill history)https://data-api.polymarket.com99.9% / 500ms p99HARD_REJECT(MODEL_DRIFT_DATA_UNAVAILABLE) if fill history unavailable.

23. Security Surfaces

Abuse vectors considered

  • Injecting synthetic fill data to deflate the drift score
  • Bypassing drift check by submitting from a strategy_id with no backtest baseline

Mitigations

  • Fill data sourced exclusively from CLOB authenticated endpoint with provenance timestamp
  • Missing baseline triggers HARD_REJECT (fail-closed); unknown strategy_id is treated as missing baseline

24. Polymarket V2 Compatibility

AspectValue
CLOB versionv2
Collateral assetpUSD
EIP-712 Exchange domain version2
Aware of builderCode fieldno
Aware of negative-risk marketsno
Multi-chain readyno
SDK usedpy-clob-client-v2
Settlement contractCTFExchangeV2
NotesFill prices and signal values are denominated in pUSD. Uses CLOB V2 authenticated fill history endpoint.

25. Versioning & Migration

FieldValue
spec2.0.0
implementation0.1.0
schema2
releasedNone
planned_releaseQ4-2026

Migration history

DateFromToReasonAction taken
2026-04-28n/av2-specSpec drafted post-CLOB-V2 cutover; bot not yet implementedDesigned against V2 schema (pUSD, builder codes, V2 EIP-712 domain)

26. Acceptance Tests

Unit Tests

TestSetupExpected result
Approve when drift score within warning thresholddrift_score=0.10, ceiling=0.25APPROVE
Warn when drift between warning and harddrift_score=0.20, warning=0.15, hard=0.25APPROVE with WARN annotation
Reject when drift exceeds ceilingdrift_score=0.32, ceiling=0.25HARD_REJECT(MODEL_DRIFT_EXCEEDED)
Skip when insufficient observationsobservations=30, lookback_n=50APPROVE (check skipped)

Integration Tests

TestExpected result
Drift detected after regime change in live fillsHARD_REJECT(MODEL_DRIFT_EXCEEDED) within one evaluation cycle of drift_score exceeding ceiling
KillSwitch bypasses drift checkHARD_REJECT(KILL_SWITCH_ACTIVE) without computing drift score

Property Tests

PropertyRequired behaviour
Drift score above hard ceiling never results in APPROVEAlways true
Missing baseline always results in HARD_REJECTAlways true — fail-closed on missing baseline

27. Operational Runbook

ModelDriftMonitor incidents typically indicate a genuine strategy degradation or a regime change. Verify whether the drift is expected before unlocking the strategy.

On-call actions

AlertFirst stepDiagnosisMitigationEscalate to
ModelDriftMonitorDriftDetectedInspect drift_score gauge per strategy_id; compare live fills to the backtest baseline to determine if drift is genuine or a data artefact.Risk pod lead; strategy may need baseline recalibration before re-enabling.
ModelDriftMonitorDataUnavailableCheck Redis baseline store and CLOB fill history endpoint.Risk pod lead if sustained > 2 minutes.

Manual overrides

  • polytraders risk update-baseline --strategy-id <id> --source recent_fills — After a confirmed intentional strategy update that changes the expected fill distribution.

Healthcheck

GET /internal/health/modeldriftmonitor → green: Baselines loaded for all active strategies, fill history accessible, p99 eval latency < 150ms; red: Any strategy baseline missing, fill history unavailable, or HARD_REJECT rate > 0.1

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

GateHow measuredThreshold
Unit tests pass for drift spike and skip scenariosCI test run100% pass

Promote to Limited live

GateHow measuredThreshold
Shadow drift scores align with expected baseline for active strategies over 48hGrafana drift_score gauge comparisonNo spurious HARD_REJECTs in shadow run

Promote to General live

GateHow measuredThreshold
Baseline update workflow tested and documentedManual E2E test of update_baseline commandPass

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

RequirementStatus
Purpose defined✓ done
Required inputs listed✓ done
Parameters defined✓ done
Defaults defined✓ done
Warning thresholds defined✓ done
Hard thresholds defined✓ done
Safe fallback defined✓ done
Structured output defined✓ done
Developer log defined✓ done
Plain-English explanation✓ done
Unit tests defined✓ done
Integration tests defined✓ done
Property tests defined✓ done
Failure-mode block complete✓ done
Reference implementation pseudocode✓ done
Wire examples (input + output)✓ done
Reason codes listed✓ done
Metrics & logs defined✓ done
State & persistence defined✓ done
Concurrency & idempotency defined✓ done
Dependencies declared✓ done
Security surfaces declared✓ done
Polymarket V2 compatibility declared✓ done
Version & migration history declared✓ done
Operational runbook defined✓ done
Promotion gates defined✓ done
Failure-injection recipes defined✓ done