Home › By Layer › Governance › 6.18 ReplaySimulator

6.18 ReplaySimulator

Governance Governance Simulate PLANNED Spec ready capital · Indirect P7 · Governance & replay ○ pending stub

Re-runs any historical pipeline trace against the current bot revisions to verify that the outcome would be the same (or to surface changes). Used for regression testing, incident post-mortems, and 'what would have happened' reviews. Runs only on recorded ReportEnvelope streams — never against live state.

v3 readiness

Docs27/27

donehow scored

Impl0/15

pendinghow scored

Backtest0/4

pendinghow scored

Runtime0/8

pendinghow scored

A bot is done when all four scores are. What does done mean?

← 6.17 APIDegradationMonitor 6.19 ConfigDriftDetector →

1. Bot Identity

Layer	Governance Governance
Bot class	Governance
Authority	Simulate
Status	PLANNED
Readiness	Spec ready
Runs before	—
Runs after	—
Applies to	Continuous
Default mode	`shadow`
User-visible	Yes
Developer owner	Governance pod

Operational profile

Ownership	Governance pod · on-call gov-oncall · #polytraders-gov · escalates to Head of Governance · P3
Latency budget	600000ms
Modes supported	offshadowadvisoryenforced
Data freshness	max_market_data_age_ms=0 · max_orderbook_age_ms=0 · on stale → Replay reads only recorded data; live freshness does not apply.
Human override	yes · by Governance on-call · logs GOV_REPLAY_OVERRIDE · time-bound: Single job · scope: Single replay window · single approver

2. Purpose

3. Why This Bot Matters

Regression detection
When a single bot is bumped, the simplest correctness check is to replay yesterday's traffic and diff the outcomes.
Postmortem reproducibility
An incident review that cannot reproduce the exact decision is just speculation.
Promotion gate
Templates require a passing replay against the canonical fixture set before promoting from shadow to advisory.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

— not yet authored —

5. Required Internal Inputs

Input	Source	Required?	Use
Recorded ReportEnvelope stream	`ReportEnvelope archive`	Yes	Source of inputs to replay.
Current bot revisions	`Bot registry`	Yes	Target system to replay against.

6. Parameter Guide

Parameter	Default	Warning	Hard	What it controls
max_replay_minutes	`60`	`60`	`180`	Maximum replay duration in a single run.
tolerance_bps	`5`	`—`	`—`	Tolerance in basis points for numeric outputs (slippage, cost) before a diff is flagged as REGRESSION.

7. Detailed Parameter Instructions

max_replay_minutes

What it means

Maximum replay duration in a single run.

Default

{ "max_replay_minutes": 60 }

Why this default matters

60 minutes is enough for a single incident window without overwhelming the simulator.

Threshold logic

Condition	Action
60	Default

Developer check

if (replay_window > p.max_replay_minutes) chunk();

User-facing English

(Internal.)

tolerance_bps

What it means

Tolerance in basis points for numeric outputs (slippage, cost) before a diff is flagged as REGRESSION.

Default

{ "tolerance_bps": 5 }

Why this default matters

5 bps is below normal noise floor on Polymarket but tight enough to catch real changes.

Threshold logic

Condition	Action
≤ 5 bps	MATCH
> 5 bps	REGRESSION

Developer check

if (abs(now - then) > p.tolerance_bps) flag('REGRESSION');

User-facing English

(Internal.)

8. Default Configuration

{
  "max_replay_minutes": 60,
  "tolerance_bps": 5
}

9. Implementation Flow

— not yet authored —

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

for env in archive.window(job.start_ms, job.end_ms):
  out = sandbox.run(current_bots, env.input)
  if differs(out, env.recorded_output, p.tolerance_bps):
    regressions.append(diff_record(env, out))
  else:
    matches += 1
emit('ReplayDigest', job, matches, regressions[:10])

11. Wire Examples

Input — what arrives on the wire

{
  "job_id": "replay_001",
  "window_start_ms": 1715260000000,
  "window_end_ms": 1715263600000
}

Output — what the bot emits

{
  "kind": "ReplayDigest",
  "matches": 1042,
  "regressions": 3
}

12. Decision Logic

APPROVE

Strict input match (same intent_id, same payload). Tolerance comparison on numeric outputs. Exact match on enum/categorical outputs.

RESHAPE_REQUIRED

This bot does not reshape orders.

REJECT

No reject path defined for this bot — it is observe-only.

WARNING_ONLY

No warn-only path defined.

13. Standard Decision Output

This bot returns a RiskVote object. See RiskVote schema.

{
  "kind": "ReplayDigest",
  "window_start_ms": 1715260000000,
  "window_end_ms": 1715263600000,
  "matches": 1042,
  "regressions": 3,
  "first_regressions": [
    {
      "intent_id": "intent_001",
      "field": "cost_estimate.slippage_bps",
      "then": 35,
      "now": 41
    }
  ]
}

14. Reason Codes

Code	Severity	Meaning	Action	User-facing message
`GOV_REPLAY_MATCH`	P3	Gov Replay Match	See decision output and developer log for context.	Replays past activity through the current system to confirm nothing important changed.
`GOV_REPLAY_REGRESSION`	P3	Gov Replay Regression	See decision output and developer log for context.	Replays past activity through the current system to confirm nothing important changed.
`GOV_REPLAY_ABORTED`	P3	Gov Replay Aborted	See decision output and developer log for context.	Replays past activity through the current system to confirm nothing important changed.
`GOV_REPLAY_NO_NETWORK_VIOLATION`	P3	Gov Replay No Network Violation	See decision output and developer log for context.	Replays past activity through the current system to confirm nothing important changed.

15. Metrics & Logs

Metrics emitted

Metric	Type	Unit	Labels	Meaning
`replay_jobs_total`	counter	event	bot_id	Replay jobs total.
`replay_matches_total`	counter	event	bot_id	Replay matches total.
`replay_regressions_total`	counter	event	bot_id	Replay regressions total.
`replay_aborts_total`	counter	event	bot_id	Replay aborts total.

Dashboards

6.18 overview dashboard

16. Developer Reporting

"Per replay: job_id, window, total_inputs, matches, regressions, runtime_ms."

17. Plain-English Reporting

Situation	User-facing explanation
When this bot acts	Replays past activity through the current system to confirm nothing important changed.

18. Failure-Mode Block

main_failure_mode	Replay sandbox accidentally calls a live network endpoint.
false_positive_risk	Time-dependent outputs (anything reading now_ms()) flagged as regressions; mitigation: the replay runtime injects a frozen clock.
false_negative_risk	Bot uses external state not captured in the recording; mitigation: bots that read external state must declare it in `data_freshness.max_external_feed_age_ms` and recordings include it.
safe_fallback	If the sandbox cannot guarantee no-network mode, abort the replay and emit ReplayDigest with status=ABORTED.
required_dependencies	—

19. Failure-Injection Recipes

Scenario	How to inject	Expected behaviour	Recovery
`Inject a deliberately wrong output and assert the regression is surfaced`	Inject a deliberately wrong output and assert the regression is surfaced.	Bot detects within its latency budget and emits the corresponding reason code.	Remove the injected fault; bot returns to healthy state within one debounce window.
`Block the archive read and assert ABORTED status`	Block the archive read and assert ABORTED status.	Bot detects within its latency budget and emits the corresponding reason code.	Remove the injected fault; bot returns to healthy state within one debounce window.

20. State & Persistence

Replay archive index. Job history. No live state.

State stores

Name	Kind	Key	Value shape	TTL	Durability
`replay_simulator_state`	in-memory + fast KV mirror	bot_id	Replay archive index. Job history. No live state.	24h	crash-safe via KV mirror

Cold-start recovery

Cold-start hydrates from fast KV; missing keys default to safe fallback.

On restart

All in-flight decisions are re-evaluated; no bot decision is trusted across restart without re-emit.

21. Concurrency & Idempotency

Aspect	Specification
Execution model	`Runs in a sandbox process pool. Concurrent jobs allowed; each is isolated.`
Max in-flight	`32`
Idempotency key	`order_intent_id`
Replay-safe	`True`
Deduplication	`By idempotency_key within a 60s window.`
Ordering guarantees	`Per-market_id FIFO; cross-market unordered.`
Per-call timeout (ms)	`250`
Backpressure strategy	`Bounded queue; oldest-dropped with metric increment when full.`
Locking / mutual exclusion	`Per-market_id mutex; no global locks.`

22. Dependencies

Consumes	`ReportEnvelopeArchive`
Emits	`OperationsReport(kind=ReplayDigest)`
Blocks orders	no

23. Security Surfaces

Sandbox network is fully blocked. Only replay archive read access.

Signing surface

None — bot does not sign or submit.

Mitigations

Rate-limit per source
Audit-log every override
Require role-based authz on admin paths

24. Polymarket V2 Compatibility

Aspect	Value
CLOB version	`V2`
Collateral asset	`pUSD`
EIP-712 Exchange domain version	`2`
Aware of builderCode field	yes
Aware of negative-risk markets	yes
Multi-chain ready	yes
SDK used	`Polymarket CLOB V2 SDK`
Settlement contract	`CTFExchangeV2`
Notes	`Replays against V2 bot revisions only.`

25. Versioning & Migration

Field	Value
current	`0.1.0`
contract_version	`1.0.0`
last_breaking_change	`none`
deprecation_window_days	`30`

26. Acceptance Tests

Unit Tests

Test	Setup	Expected result
A replay against an unchanged bot version reports zero regressions on its golden traces.	Synthetic fixture per template.	Behaviour matches the rule described in the test name.

Integration Tests

Test	Expected result
Replay 1 hour of recorded traffic through a deliberately changed bot and assert regressions are surfaced.	End-to-end behaviour matches the spec without manual intervention.

Property Tests

Property	Required behaviour
Match count + regression count equals total input count for any non-aborted run.	Always true across all generated inputs.

27. Operational Runbook

If replays fail with NO_NETWORK_VIOLATION, the offending bot leaks an external call — file a P2 issue.

On-call actions

Alert	First step	Diagnosis	Mitigation	Escalate to
`6.18_anomaly`	Open the bot's reporting page and confirm the alert is real (not a metric hiccup).	Inspect developer log entries for the affected market_id over the last 30 minutes.	Force-clear via Admin UI if the rule is clearly stale; otherwise leave engaged and notify owner.	Governance pod

Manual overrides

polytraders bot pause 6.18 — Disables the bot's enforcement layer; downstream consumers fall back to safe defaults.

Healthcheck

GET /healthz/replay_simulator → 200 if last successful evaluation < 60s ago.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

Gate	How measured	Threshold
Stub	golden replay reports zero regressions.	Documented threshold met for the full window.

Promote to Limited live

Gate	How measured	Threshold
Shadow	14 days running on a daily window.	Documented threshold met for the full window.
Advisory	7 days.	Documented threshold met for the full window.

Promote to General live

Gate	How measured	Threshold
Enforced	every promotion through the modes ladder requires a passing replay digest.	Documented threshold met for the full window.

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

Requirement	Status
Purpose defined	✓ done
Required inputs listed	✓ done
Parameters defined	✓ done
Defaults defined	✓ done
Warning thresholds defined	✓ done
Hard thresholds defined	✓ done
Safe fallback defined	✓ done
Structured output defined	✓ done
Developer log defined	✓ done
Plain-English explanation	✓ done
Unit tests defined	✓ done
Integration tests defined	✓ done
Property tests defined	✓ done
Failure-mode block complete	✓ done
Reference implementation pseudocode	✓ done
Wire examples (input + output)	✓ done
Reason codes listed	✓ done
Metrics & logs defined	✓ done
State & persistence defined	✓ done
Concurrency & idempotency defined	✓ done
Dependencies declared	✓ done
Security surfaces declared	✓ done
Polymarket V2 compatibility declared	✓ done
Version & migration history declared	✓ done
Operational runbook defined	✓ done
Promotion gates defined	✓ done
Failure-injection recipes defined	✓ done

6.18 ReplaySimulator

v3 readiness

1. Bot Identity

Operational profile

2. Purpose

3. Why This Bot Matters

Regression detection

Postmortem reproducibility

Promotion gate

4. Required Polymarket Inputs

5. Required Internal Inputs

6. Parameter Guide

7. Detailed Parameter Instructions

max_replay_minutes

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

tolerance_bps

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

8. Default Configuration

9. Implementation Flow

10. Reference Implementation

11. Wire Examples

Input — what arrives on the wire

Output — what the bot emits

12. Decision Logic

APPROVE

RESHAPE_REQUIRED

REJECT

WARNING_ONLY

13. Standard Decision Output

14. Reason Codes

15. Metrics & Logs

Metrics emitted

Dashboards

16. Developer Reporting

17. Plain-English Reporting

18. Failure-Mode Block

19. Failure-Injection Recipes

20. State & Persistence

State stores

Cold-start recovery

On restart

21. Concurrency & Idempotency

22. Dependencies

23. Security Surfaces

Signing surface

Mitigations

24. Polymarket V2 Compatibility

25. Versioning & Migration

26. Acceptance Tests

Unit Tests

Integration Tests

Property Tests

27. Operational Runbook

On-call actions

Manual overrides

Healthcheck

28. Promotion Gates

Promote to Shadow

Promote to Limited live

Promote to General live

29. Developer Checklist