Home › By Layer › Governance › 6.19 ConfigDriftDetector

6.19 ConfigDriftDetector

Governance Governance Observe PLANNED Spec ready capital · Direct P7 · Governance & replay ○ pending stub

Compares the running BotConfig of every live bot against the latest committed config in the config repo. Any drift (running != committed) is surfaced as a ConfigDriftReport naming the bot, the field, and the drift amount. Operators are forced to either commit the change or revert it.

v3 readiness

Docs27/27

donehow scored

Impl0/15

pendinghow scored

Backtest0/4

pendinghow scored

Runtime0/8

pendinghow scored

A bot is done when all four scores are. What does done mean?

← 6.18 ReplaySimulator

1. Bot Identity

Layer	Governance Governance
Bot class	Governance
Authority	Observe
Status	PLANNED
Readiness	Spec ready
Runs before	—
Runs after	—
Applies to	Continuous
Default mode	`shadow`
User-visible	Yes
Developer owner	Governance pod

Operational profile

Ownership	Governance pod · on-call gov-oncall · #polytraders-gov · escalates to Head of Governance · P2
Latency budget	5000ms
Modes supported	offshadowadvisoryenforced
Data freshness	max_market_data_age_ms=900000 · max_orderbook_age_ms=900000 · max_external_feed_age_ms=900000 · on stale → Emit status=UNKNOWN.
Human override	yes · by Governance on-call · logs GOV_CONFIG_DRIFT_ACK · time-bound: Until next check · scope: Single bot_slug · second approval required

2. Purpose

3. Why This Bot Matters

Untracked tuning
An on-call who tweaks a threshold via the Admin UI without committing the change loses an audit trail; the next incident review cannot reconstruct the system state.
Drift between staging and prod
Without an explicit comparison, prod can silently run an ancient config while staging is updated.
Compliance evidence
Auditors require evidence that the running configuration matches a reviewed and signed-off version.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

— not yet authored —

5. Required Internal Inputs

Input	Source	Required?	Use
Running BotConfig per bot	`Bot runtime`	Yes	Effective config in process memory, including any live-edited fields.
Committed BotConfig per bot	`Config repo (Git)`	Yes	Source of truth signed-off configuration.

6. Parameter Guide

Parameter	Default	Warning	Hard	What it controls
check_interval_minutes	`15`	`30`	`60`	How often the drift comparison runs.
tolerance_for_numeric_drift	`0`	`0`	`0.001`	Tolerance for numeric fields before a drift is flagged.

7. Detailed Parameter Instructions

check_interval_minutes

What it means

How often the drift comparison runs.

Default

{ "check_interval_minutes": 15 }

Why this default matters

Quarter-hourly is frequent enough to catch live edits before they outlive the on-call shift.

Threshold logic

Condition	Action
15	Default

Developer check

schedule.every(p.check_interval_minutes).do(check);

User-facing English

(Internal.)

tolerance_for_numeric_drift

What it means

Tolerance for numeric fields before a drift is flagged.

Default

{ "tolerance_for_numeric_drift": 0 }

Why this default matters

Zero — there is no acceptable silent drift in production. Use `human_override` to record an intentional change.

Threshold logic

Condition	Action
0	Default — strict

Developer check

if (abs(running - committed) > p.tolerance_for_numeric_drift) flag(field);

User-facing English

(Internal.)

8. Default Configuration

{
  "check_interval_minutes": 15,
  "tolerance_for_numeric_drift": 0
}

9. Implementation Flow

— not yet authored —

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

for bot in registry.live_bots():
  running = bot.runtime_config()
  committed = repo.config(bot.slug, branch='main')
  diffs = canonical_diff(running, committed, p.tolerance_for_numeric_drift)
  if diffs: emit('ConfigDriftReport', bot.slug, diffs)

11. Wire Examples

Input — what arrives on the wire

{
  "bot_slug": "risk.killswitch",
  "running": {
    "intraday_drawdown_pct": 10
  },
  "committed": {
    "intraday_drawdown_pct": 12
  }
}

Output — what the bot emits

{
  "kind": "ConfigDriftReport",
  "bot_slug": "risk.killswitch",
  "drifts": [
    {
      "field": "intraday_drawdown_pct",
      "running": 10,
      "committed": 12
    }
  ]
}

12. Decision Logic

APPROVE

Strict equality on enums and strings. Numeric tolerance applied via `tolerance_for_numeric_drift`. Drift latched until either a commit or revert resolves it.

RESHAPE_REQUIRED

This bot does not reshape orders.

REJECT

No reject path defined for this bot — it is observe-only.

WARNING_ONLY

No warn-only path defined.

13. Standard Decision Output

This bot returns a RiskVote object. See RiskVote schema.

{
  "kind": "ConfigDriftReport",
  "bot_slug": "risk.killswitch",
  "drifts": [
    {
      "field": "intraday_drawdown_pct",
      "running": 10,
      "committed": 12,
      "since_ts_ms": 1715260000000
    }
  ]
}

14. Reason Codes

Code	Severity	Meaning	Action	User-facing message
`GOV_CONFIG_DRIFT_DETECTED`	P3	Gov Config Drift Detected	See decision output and developer log for context.	The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile.
`GOV_CONFIG_DRIFT_RESOLVED`	P3	Gov Config Drift Resolved	See decision output and developer log for context.	The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile.
`GOV_CONFIG_DRIFT_UNKNOWN`	P3	Gov Config Drift Unknown	See decision output and developer log for context.	The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile.

15. Metrics & Logs

Metrics emitted

Metric	Type	Unit	Labels	Meaning
`drift_reports_total`	counter	event	bot_id	Drift reports total.
`bots_in_drift`	counter	event	bot_id	Bots in drift.
`drift_resolution_minutes_histogram`	counter	event	bot_id	Drift resolution minutes histogram.

Dashboards

6.19 overview dashboard

16. Developer Reporting

"Per check: bot_slug, drifts_count, fields_drifted."

17. Plain-English Reporting

Situation	User-facing explanation
When this bot acts	The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile.

18. Failure-Mode Block

main_failure_mode	Comparing against the wrong committed revision (e.g. wrong branch).
false_positive_risk	Differences in field ordering or default-equivalence falsely flagged; mitigation: canonicalise both sides through the JSON Schema before diffing.
false_negative_risk	Bot's running config object is missing a field the committed version added; mitigation: schema-validate both sides and treat missing-vs-present as drift.
safe_fallback	If the committed config cannot be fetched, emit ConfigDriftReport with status=UNKNOWN — never silently report 'no drift'.
required_dependencies	—

19. Failure-Injection Recipes

Scenario	How to inject	Expected behaviour	Recovery
`Block the config repo and assert UNKNOWN is emitted`	Block the config repo and assert UNKNOWN is emitted.	Bot detects within its latency budget and emits the corresponding reason code.	Remove the injected fault; bot returns to healthy state within one debounce window.
`Drift one field and assert the report contains exactly that field`	Drift one field and assert the report contains exactly that field.	Bot detects within its latency budget and emits the corresponding reason code.	Remove the injected fault; bot returns to healthy state within one debounce window.

20. State & Persistence

Last drift report per bot. Persisted to KV.

State stores

Name	Kind	Key	Value shape	TTL	Durability
`config_drift_detector_state`	in-memory + fast KV mirror	bot_id	Last drift report per bot. Persisted to KV.	24h	crash-safe via KV mirror

Cold-start recovery

Cold-start hydrates from fast KV; missing keys default to safe fallback.

On restart

All in-flight decisions are re-evaluated; no bot decision is trusted across restart without re-emit.

21. Concurrency & Idempotency

Aspect	Specification
Execution model	`Single scheduled checker; no per-bot fan-out.`
Max in-flight	`32`
Idempotency key	`order_intent_id`
Replay-safe	`True`
Deduplication	`By idempotency_key within a 60s window.`
Ordering guarantees	`Per-market_id FIFO; cross-market unordered.`
Per-call timeout (ms)	`250`
Backpressure strategy	`Bounded queue; oldest-dropped with metric increment when full.`
Locking / mutual exclusion	`Per-market_id mutex; no global locks.`

22. Dependencies

Consumes	`BotConfigRunning` `BotConfigCommitted`
Emits	`OperationsReport(kind=ConfigDriftReport)`
Blocks orders	no

23. Security Surfaces

Read-only access to config repo. Read-only RPC into bot runtime.

Signing surface

None — bot does not sign or submit.

Mitigations

Rate-limit per source
Audit-log every override
Require role-based authz on admin paths

24. Polymarket V2 Compatibility

Aspect	Value
CLOB version	`V2`
Collateral asset	`pUSD`
EIP-712 Exchange domain version	`2`
Aware of builderCode field	yes
Aware of negative-risk markets	yes
Multi-chain ready	yes
SDK used	`Polymarket CLOB V2 SDK`
Settlement contract	`CTFExchangeV2`
Notes	`Operates on V2 BotConfig schema only.`

25. Versioning & Migration

Field	Value
current	`0.1.0`
contract_version	`1.0.0`
last_breaking_change	`none`
deprecation_window_days	`30`

26. Acceptance Tests

Unit Tests

Test	Setup	Expected result
Identical configs report no drift.	Synthetic fixture per template.	Behaviour matches the rule described in the test name.
One numeric field changed by 1 reports the drift exactly.	Synthetic fixture per template.	Behaviour matches the rule described in the test name.

Integration Tests

Test	Expected result
Bump a running config via Admin UI without committing; the next check emits a drift report within one interval.	End-to-end behaviour matches the spec without manual intervention.

Property Tests

Property	Required behaviour
For any (running, committed), the drift list contains exactly the fields whose canonicalised values differ.	Always true across all generated inputs.

27. Operational Runbook

If multiple bots drift simultaneously: confirm the config repo branch the checker is reading from is the production branch.

On-call actions

Alert	First step	Diagnosis	Mitigation	Escalate to
`6.19_anomaly`	Open the bot's reporting page and confirm the alert is real (not a metric hiccup).	Inspect developer log entries for the affected market_id over the last 30 minutes.	Force-clear via Admin UI if the rule is clearly stale; otherwise leave engaged and notify owner.	Governance pod

Manual overrides

polytraders bot pause 6.19 — Disables the bot's enforcement layer; downstream consumers fall back to safe defaults.

Healthcheck

GET /healthz/config_drift_detector → 200 if last successful evaluation < 60s ago.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

Gate	How measured	Threshold
Stub	against synthetic drifts.	Documented threshold met for the full window.

Promote to Limited live

Gate	How measured	Threshold
Shadow	14 days; reports compared by Governance on-call.	Documented threshold met for the full window.
Advisory	7 days.	Documented threshold met for the full window.

Promote to General live

Gate	How measured	Threshold
Enforced	drift reports break the daily ops digest.	Documented threshold met for the full window.

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

Requirement	Status
Purpose defined	✓ done
Required inputs listed	✓ done
Parameters defined	✓ done
Defaults defined	✓ done
Warning thresholds defined	✓ done
Hard thresholds defined	✓ done
Safe fallback defined	✓ done
Structured output defined	✓ done
Developer log defined	✓ done
Plain-English explanation	✓ done
Unit tests defined	✓ done
Integration tests defined	✓ done
Property tests defined	✓ done
Failure-mode block complete	✓ done
Reference implementation pseudocode	✓ done
Wire examples (input + output)	✓ done
Reason codes listed	✓ done
Metrics & logs defined	✓ done
State & persistence defined	✓ done
Concurrency & idempotency defined	✓ done
Dependencies declared	✓ done
Security surfaces declared	✓ done
Polymarket V2 compatibility declared	✓ done
Version & migration history declared	✓ done
Operational runbook defined	✓ done
Promotion gates defined	✓ done
Failure-injection recipes defined	✓ done

6.19 ConfigDriftDetector

v3 readiness

1. Bot Identity

Operational profile

2. Purpose

3. Why This Bot Matters

Untracked tuning

Drift between staging and prod

Compliance evidence

4. Required Polymarket Inputs

5. Required Internal Inputs

6. Parameter Guide

7. Detailed Parameter Instructions

check_interval_minutes

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

tolerance_for_numeric_drift

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

8. Default Configuration

9. Implementation Flow

10. Reference Implementation

11. Wire Examples

Input — what arrives on the wire

Output — what the bot emits

12. Decision Logic

APPROVE

RESHAPE_REQUIRED

REJECT

WARNING_ONLY

13. Standard Decision Output

14. Reason Codes

15. Metrics & Logs

Metrics emitted

Dashboards

16. Developer Reporting

17. Plain-English Reporting

18. Failure-Mode Block

19. Failure-Injection Recipes

20. State & Persistence

State stores

Cold-start recovery

On restart

21. Concurrency & Idempotency

22. Dependencies

23. Security Surfaces

Signing surface

Mitigations

24. Polymarket V2 Compatibility

25. Versioning & Migration

26. Acceptance Tests

Unit Tests

Integration Tests

Property Tests

27. Operational Runbook

On-call actions

Manual overrides

Healthcheck

28. Promotion Gates

Promote to Shadow

Promote to Limited live

Promote to General live

29. Developer Checklist