6.19 ConfigDriftDetector
Compares the running BotConfig of every live bot against the latest committed config in the config repo. Any drift (running != committed) is surfaced as a ConfigDriftReport naming the bot, the field, and the drift amount. Operators are forced to either commit the change or revert it.
v3 readiness
A bot is done when all four scores are. What does done mean?
1. Bot Identity
| Layer | Governance Governance |
|---|---|
| Bot class | Governance |
| Authority | Observe |
| Status | PLANNED |
| Readiness | Spec ready |
| Runs before | — |
| Runs after | — |
| Applies to | Continuous |
| Default mode | shadow |
| User-visible | Yes |
| Developer owner | Governance pod |
Operational profile
| Ownership | Governance pod · on-call gov-oncall · #polytraders-gov · escalates to Head of Governance · P2 |
|---|---|
| Latency budget | 5000ms |
| Modes supported | offshadowadvisoryenforced |
| Data freshness | max_market_data_age_ms=900000 · max_orderbook_age_ms=900000 · max_external_feed_age_ms=900000 · on stale → Emit status=UNKNOWN. |
| Human override | yes · by Governance on-call · logs GOV_CONFIG_DRIFT_ACK · time-bound: Until next check · scope: Single bot_slug · second approval required |
2. Purpose
Compares the running BotConfig of every live bot against the latest committed config in the config repo. Any drift (running != committed) is surfaced as a ConfigDriftReport naming the bot, the field, and the drift amount. Operators are forced to either commit the change or revert it.
3. Why This Bot Matters
Untracked tuning
An on-call who tweaks a threshold via the Admin UI without committing the change loses an audit trail; the next incident review cannot reconstruct the system state.
Drift between staging and prod
Without an explicit comparison, prod can silently run an ancient config while staging is updated.
Compliance evidence
Auditors require evidence that the running configuration matches a reviewed and signed-off version.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
4. Required Polymarket Inputs
— not yet authored —
5. Required Internal Inputs
| Input | Source | Required? | Use |
|---|---|---|---|
| Running BotConfig per bot | Bot runtime | Yes | Effective config in process memory, including any live-edited fields. |
| Committed BotConfig per bot | Config repo (Git) | Yes | Source of truth signed-off configuration. |
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|---|---|---|---|
| check_interval_minutes | 15 | 30 | 60 | How often the drift comparison runs. |
| tolerance_for_numeric_drift | 0 | 0 | 0.001 | Tolerance for numeric fields before a drift is flagged. |
7. Detailed Parameter Instructions
check_interval_minutes
What it means
How often the drift comparison runs.
Default
{ "check_interval_minutes": 15 }
Why this default matters
Quarter-hourly is frequent enough to catch live edits before they outlive the on-call shift.
Threshold logic
| Condition | Action |
|---|---|
| 15 | Default |
Developer check
schedule.every(p.check_interval_minutes).do(check);
User-facing English
(Internal.)
tolerance_for_numeric_drift
What it means
Tolerance for numeric fields before a drift is flagged.
Default
{ "tolerance_for_numeric_drift": 0 }
Why this default matters
Zero — there is no acceptable silent drift in production. Use `human_override` to record an intentional change.
Threshold logic
| Condition | Action |
|---|---|
| 0 | Default — strict |
Developer check
if (abs(running - committed) > p.tolerance_for_numeric_drift) flag(field);
User-facing English
(Internal.)
8. Default Configuration
{
"check_interval_minutes": 15,
"tolerance_for_numeric_drift": 0
}9. Implementation Flow
— not yet authored —
10. Reference Implementation
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.
for bot in registry.live_bots():
running = bot.runtime_config()
committed = repo.config(bot.slug, branch='main')
diffs = canonical_diff(running, committed, p.tolerance_for_numeric_drift)
if diffs: emit('ConfigDriftReport', bot.slug, diffs)11. Wire Examples
Input — what arrives on the wire
{
"bot_slug": "risk.killswitch",
"running": {
"intraday_drawdown_pct": 10
},
"committed": {
"intraday_drawdown_pct": 12
}
}
Output — what the bot emits
{
"kind": "ConfigDriftReport",
"bot_slug": "risk.killswitch",
"drifts": [
{
"field": "intraday_drawdown_pct",
"running": 10,
"committed": 12
}
]
}12. Decision Logic
APPROVE
Strict equality on enums and strings. Numeric tolerance applied via `tolerance_for_numeric_drift`. Drift latched until either a commit or revert resolves it.
RESHAPE_REQUIRED
This bot does not reshape orders.
REJECT
No reject path defined for this bot — it is observe-only.
WARNING_ONLY
No warn-only path defined.
13. Standard Decision Output
This bot returns a RiskVote object. See RiskVote schema.
{
"kind": "ConfigDriftReport",
"bot_slug": "risk.killswitch",
"drifts": [
{
"field": "intraday_drawdown_pct",
"running": 10,
"committed": 12,
"since_ts_ms": 1715260000000
}
]
}14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|---|---|---|---|
GOV_CONFIG_DRIFT_DETECTED | P3 | Gov Config Drift Detected | See decision output and developer log for context. | The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile. |
GOV_CONFIG_DRIFT_RESOLVED | P3 | Gov Config Drift Resolved | See decision output and developer log for context. | The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile. |
GOV_CONFIG_DRIFT_UNKNOWN | P3 | Gov Config Drift Unknown | See decision output and developer log for context. | The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile. |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|---|---|---|---|
drift_reports_total | counter | event | bot_id | Drift reports total. |
bots_in_drift | counter | event | bot_id | Bots in drift. |
drift_resolution_minutes_histogram | counter | event | bot_id | Drift resolution minutes histogram. |
Dashboards
- 6.19 overview dashboard
16. Developer Reporting
"Per check: bot_slug, drifts_count, fields_drifted."17. Plain-English Reporting
| Situation | User-facing explanation |
|---|---|
| When this bot acts | The running configuration of one of the system's safeties differs from the version on file. Operators must reconcile. |
18. Failure-Mode Block
| main_failure_mode | Comparing against the wrong committed revision (e.g. wrong branch). |
|---|---|
| false_positive_risk | Differences in field ordering or default-equivalence falsely flagged; mitigation: canonicalise both sides through the JSON Schema before diffing. |
| false_negative_risk | Bot's running config object is missing a field the committed version added; mitigation: schema-validate both sides and treat missing-vs-present as drift. |
| safe_fallback | If the committed config cannot be fetched, emit ConfigDriftReport with status=UNKNOWN — never silently report 'no drift'. |
| required_dependencies | — |
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|---|---|---|
Block the config repo and assert UNKNOWN is emitted | Block the config repo and assert UNKNOWN is emitted. | Bot detects within its latency budget and emits the corresponding reason code. | Remove the injected fault; bot returns to healthy state within one debounce window. |
Drift one field and assert the report contains exactly that field | Drift one field and assert the report contains exactly that field. | Bot detects within its latency budget and emits the corresponding reason code. | Remove the injected fault; bot returns to healthy state within one debounce window. |
20. State & Persistence
Last drift report per bot. Persisted to KV.
State stores
| Name | Kind | Key | Value shape | TTL | Durability |
|---|---|---|---|---|---|
config_drift_detector_state | in-memory + fast KV mirror | bot_id | Last drift report per bot. Persisted to KV. | 24h | crash-safe via KV mirror |
Cold-start recovery
Cold-start hydrates from fast KV; missing keys default to safe fallback.
On restart
All in-flight decisions are re-evaluated; no bot decision is trusted across restart without re-emit.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|---|
| Execution model | Single scheduled checker; no per-bot fan-out. |
| Max in-flight | 32 |
| Idempotency key | order_intent_id |
| Replay-safe | True |
| Deduplication | By idempotency_key within a 60s window. |
| Ordering guarantees | Per-market_id FIFO; cross-market unordered. |
| Per-call timeout (ms) | 250 |
| Backpressure strategy | Bounded queue; oldest-dropped with metric increment when full. |
| Locking / mutual exclusion | Per-market_id mutex; no global locks. |
22. Dependencies
| Consumes | BotConfigRunning BotConfigCommitted |
|---|---|
| Emits | OperationsReport(kind=ConfigDriftReport) |
| Blocks orders | no |
23. Security Surfaces
Read-only access to config repo. Read-only RPC into bot runtime.
Signing surface
None — bot does not sign or submit.
Mitigations
- Rate-limit per source
- Audit-log every override
- Require role-based authz on admin paths
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|---|
| CLOB version | V2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | yes |
| Aware of negative-risk markets | yes |
| Multi-chain ready | yes |
| SDK used | Polymarket CLOB V2 SDK |
| Settlement contract | CTFExchangeV2 |
| Notes | Operates on V2 BotConfig schema only. |
25. Versioning & Migration
| Field | Value |
|---|---|
| current | 0.1.0 |
| contract_version | 1.0.0 |
| last_breaking_change | none |
| deprecation_window_days | 30 |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|---|---|
| Identical configs report no drift. | Synthetic fixture per template. | Behaviour matches the rule described in the test name. |
| One numeric field changed by 1 reports the drift exactly. | Synthetic fixture per template. | Behaviour matches the rule described in the test name. |
Integration Tests
| Test | Expected result |
|---|---|
| Bump a running config via Admin UI without committing; the next check emits a drift report within one interval. | End-to-end behaviour matches the spec without manual intervention. |
Property Tests
| Property | Required behaviour |
|---|---|
| For any (running, committed), the drift list contains exactly the fields whose canonicalised values differ. | Always true across all generated inputs. |
27. Operational Runbook
If multiple bots drift simultaneously: confirm the config repo branch the checker is reading from is the production branch.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|---|---|---|---|
6.19_anomaly | Open the bot's reporting page and confirm the alert is real (not a metric hiccup). | Inspect developer log entries for the affected market_id over the last 30 minutes. | Force-clear via Admin UI if the rule is clearly stale; otherwise leave engaged and notify owner. | Governance pod |
Manual overrides
polytraders bot pause 6.19— Disables the bot's enforcement layer; downstream consumers fall back to safe defaults.
Healthcheck
GET /healthz/config_drift_detector → 200 if last successful evaluation < 60s ago.28. Promotion Gates
A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.
Promote to Shadow
| Gate | How measured | Threshold |
|---|---|---|
| Stub | against synthetic drifts. | Documented threshold met for the full window. |
Promote to Limited live
| Gate | How measured | Threshold |
|---|---|---|
| Shadow | 14 days; reports compared by Governance on-call. | Documented threshold met for the full window. |
| Advisory | 7 days. | Documented threshold met for the full window. |
Promote to General live
| Gate | How measured | Threshold |
|---|---|---|
| Enforced | drift reports break the daily ops digest. | Documented threshold met for the full window. |
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |