Polytraders Dev Guide
internal
v3 spine Phase 1 · Shared contracts 9 demo-wired · 0 shadow-ready · 0 production-live · 100 pending · 109 total 15/33 infra tasks the plan status board
HomeBy LayerGovernance6.18 ReplaySimulator

6.18 ReplaySimulator

Governance Governance Simulate PLANNED Spec ready capital · Indirect P7 · Governance & replay pending stub

Re-runs any historical pipeline trace against the current bot revisions to verify that the outcome would be the same (or to surface changes). Used for regression testing, incident post-mortems, and 'what would have happened' reviews. Runs only on recorded ReportEnvelope streams — never against live state.

v3 readiness

Docs27/27
donehow scored
Impl0/15
pendinghow scored
Backtest0/4
pendinghow scored
Runtime0/8
pendinghow scored

A bot is done when all four scores are. What does done mean?

1. Bot Identity

LayerGovernance  Governance
Bot classGovernance
AuthoritySimulate
StatusPLANNED
ReadinessSpec ready
Runs before
Runs after
Applies toContinuous
Default modeshadow
User-visibleYes
Developer ownerGovernance pod

Operational profile

OwnershipGovernance pod · on-call gov-oncall · #polytraders-gov · escalates to Head of Governance · P3
Latency budget600000ms
Modes supportedoffshadowadvisoryenforced
Data freshnessmax_market_data_age_ms=0 · max_orderbook_age_ms=0 · on stale → Replay reads only recorded data; live freshness does not apply.
Human overrideyes · by Governance on-call · logs GOV_REPLAY_OVERRIDE · time-bound: Single job · scope: Single replay window · single approver

2. Purpose

Re-runs any historical pipeline trace against the current bot revisions to verify that the outcome would be the same (or to surface changes). Used for regression testing, incident post-mortems, and 'what would have happened' reviews. Runs only on recorded ReportEnvelope streams — never against live state.

3. Why This Bot Matters

  • Regression detection

    When a single bot is bumped, the simplest correctness check is to replay yesterday's traffic and diff the outcomes.

  • Postmortem reproducibility

    An incident review that cannot reproduce the exact decision is just speculation.

  • Promotion gate

    Templates require a passing replay against the canonical fixture set before promoting from shadow to advisory.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

— not yet authored —

5. Required Internal Inputs

InputSourceRequired?Use
Recorded ReportEnvelope streamReportEnvelope archiveYesSource of inputs to replay.
Current bot revisionsBot registryYesTarget system to replay against.

6. Parameter Guide

ParameterDefaultWarningHardWhat it controls
max_replay_minutes6060180Maximum replay duration in a single run.
tolerance_bps5Tolerance in basis points for numeric outputs (slippage, cost) before a diff is flagged as REGRESSION.

7. Detailed Parameter Instructions

max_replay_minutes

What it means

Maximum replay duration in a single run.

Default

{ "max_replay_minutes": 60 }

Why this default matters

60 minutes is enough for a single incident window without overwhelming the simulator.

Threshold logic

ConditionAction
60Default

Developer check

if (replay_window > p.max_replay_minutes) chunk();

User-facing English

(Internal.)

tolerance_bps

What it means

Tolerance in basis points for numeric outputs (slippage, cost) before a diff is flagged as REGRESSION.

Default

{ "tolerance_bps": 5 }

Why this default matters

5 bps is below normal noise floor on Polymarket but tight enough to catch real changes.

Threshold logic

ConditionAction
≤ 5 bpsMATCH
> 5 bpsREGRESSION

Developer check

if (abs(now - then) > p.tolerance_bps) flag('REGRESSION');

User-facing English

(Internal.)

8. Default Configuration

{
  "max_replay_minutes": 60,
  "tolerance_bps": 5
}

9. Implementation Flow

— not yet authored —

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

for env in archive.window(job.start_ms, job.end_ms):
  out = sandbox.run(current_bots, env.input)
  if differs(out, env.recorded_output, p.tolerance_bps):
    regressions.append(diff_record(env, out))
  else:
    matches += 1
emit('ReplayDigest', job, matches, regressions[:10])

11. Wire Examples

Input — what arrives on the wire

{
  "job_id": "replay_001",
  "window_start_ms": 1715260000000,
  "window_end_ms": 1715263600000
}

Output — what the bot emits

{
  "kind": "ReplayDigest",
  "matches": 1042,
  "regressions": 3
}

12. Decision Logic

APPROVE

Strict input match (same intent_id, same payload). Tolerance comparison on numeric outputs. Exact match on enum/categorical outputs.

RESHAPE_REQUIRED

This bot does not reshape orders.

REJECT

No reject path defined for this bot — it is observe-only.

WARNING_ONLY

No warn-only path defined.

13. Standard Decision Output

This bot returns a RiskVote object. See RiskVote schema.

{
  "kind": "ReplayDigest",
  "window_start_ms": 1715260000000,
  "window_end_ms": 1715263600000,
  "matches": 1042,
  "regressions": 3,
  "first_regressions": [
    {
      "intent_id": "intent_001",
      "field": "cost_estimate.slippage_bps",
      "then": 35,
      "now": 41
    }
  ]
}

14. Reason Codes

CodeSeverityMeaningActionUser-facing message
GOV_REPLAY_MATCHP3Gov Replay MatchSee decision output and developer log for context.Replays past activity through the current system to confirm nothing important changed.
GOV_REPLAY_REGRESSIONP3Gov Replay RegressionSee decision output and developer log for context.Replays past activity through the current system to confirm nothing important changed.
GOV_REPLAY_ABORTEDP3Gov Replay AbortedSee decision output and developer log for context.Replays past activity through the current system to confirm nothing important changed.
GOV_REPLAY_NO_NETWORK_VIOLATIONP3Gov Replay No Network ViolationSee decision output and developer log for context.Replays past activity through the current system to confirm nothing important changed.

15. Metrics & Logs

Metrics emitted

MetricTypeUnitLabelsMeaning
replay_jobs_totalcountereventbot_idReplay jobs total.
replay_matches_totalcountereventbot_idReplay matches total.
replay_regressions_totalcountereventbot_idReplay regressions total.
replay_aborts_totalcountereventbot_idReplay aborts total.

Dashboards

  • 6.18 overview dashboard

16. Developer Reporting

"Per replay: job_id, window, total_inputs, matches, regressions, runtime_ms."

17. Plain-English Reporting

SituationUser-facing explanation
When this bot actsReplays past activity through the current system to confirm nothing important changed.

18. Failure-Mode Block

main_failure_modeReplay sandbox accidentally calls a live network endpoint.
false_positive_riskTime-dependent outputs (anything reading now_ms()) flagged as regressions; mitigation: the replay runtime injects a frozen clock.
false_negative_riskBot uses external state not captured in the recording; mitigation: bots that read external state must declare it in `data_freshness.max_external_feed_age_ms` and recordings include it.
safe_fallbackIf the sandbox cannot guarantee no-network mode, abort the replay and emit ReplayDigest with status=ABORTED.
required_dependencies

19. Failure-Injection Recipes

ScenarioHow to injectExpected behaviourRecovery
Inject a deliberately wrong output and assert the regression is surfacedInject a deliberately wrong output and assert the regression is surfaced.Bot detects within its latency budget and emits the corresponding reason code.Remove the injected fault; bot returns to healthy state within one debounce window.
Block the archive read and assert ABORTED statusBlock the archive read and assert ABORTED status.Bot detects within its latency budget and emits the corresponding reason code.Remove the injected fault; bot returns to healthy state within one debounce window.

20. State & Persistence

Replay archive index. Job history. No live state.

State stores

NameKindKeyValue shapeTTLDurability
replay_simulator_statein-memory + fast KV mirrorbot_idReplay archive index. Job history. No live state.24hcrash-safe via KV mirror

Cold-start recovery

Cold-start hydrates from fast KV; missing keys default to safe fallback.

On restart

All in-flight decisions are re-evaluated; no bot decision is trusted across restart without re-emit.

21. Concurrency & Idempotency

AspectSpecification
Execution modelRuns in a sandbox process pool. Concurrent jobs allowed; each is isolated.
Max in-flight32
Idempotency keyorder_intent_id
Replay-safeTrue
DeduplicationBy idempotency_key within a 60s window.
Ordering guaranteesPer-market_id FIFO; cross-market unordered.
Per-call timeout (ms)250
Backpressure strategyBounded queue; oldest-dropped with metric increment when full.
Locking / mutual exclusionPer-market_id mutex; no global locks.

22. Dependencies

ConsumesReportEnvelopeArchive
EmitsOperationsReport(kind=ReplayDigest)
Blocks ordersno

23. Security Surfaces

Sandbox network is fully blocked. Only replay archive read access.

Signing surface

None — bot does not sign or submit.

Mitigations

  • Rate-limit per source
  • Audit-log every override
  • Require role-based authz on admin paths

24. Polymarket V2 Compatibility

AspectValue
CLOB versionV2
Collateral assetpUSD
EIP-712 Exchange domain version2
Aware of builderCode fieldyes
Aware of negative-risk marketsyes
Multi-chain readyyes
SDK usedPolymarket CLOB V2 SDK
Settlement contractCTFExchangeV2
NotesReplays against V2 bot revisions only.

25. Versioning & Migration

FieldValue
current0.1.0
contract_version1.0.0
last_breaking_changenone
deprecation_window_days30

26. Acceptance Tests

Unit Tests

TestSetupExpected result
A replay against an unchanged bot version reports zero regressions on its golden traces.Synthetic fixture per template.Behaviour matches the rule described in the test name.

Integration Tests

TestExpected result
Replay 1 hour of recorded traffic through a deliberately changed bot and assert regressions are surfaced.End-to-end behaviour matches the spec without manual intervention.

Property Tests

PropertyRequired behaviour
Match count + regression count equals total input count for any non-aborted run.Always true across all generated inputs.

27. Operational Runbook

If replays fail with NO_NETWORK_VIOLATION, the offending bot leaks an external call — file a P2 issue.

On-call actions

AlertFirst stepDiagnosisMitigationEscalate to
6.18_anomalyOpen the bot's reporting page and confirm the alert is real (not a metric hiccup).Inspect developer log entries for the affected market_id over the last 30 minutes.Force-clear via Admin UI if the rule is clearly stale; otherwise leave engaged and notify owner.Governance pod

Manual overrides

  • polytraders bot pause 6.18 — Disables the bot's enforcement layer; downstream consumers fall back to safe defaults.

Healthcheck

GET /healthz/replay_simulator → 200 if last successful evaluation < 60s ago.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

GateHow measuredThreshold
Stubgolden replay reports zero regressions.Documented threshold met for the full window.

Promote to Limited live

GateHow measuredThreshold
Shadow14 days running on a daily window.Documented threshold met for the full window.
Advisory7 days.Documented threshold met for the full window.

Promote to General live

GateHow measuredThreshold
Enforcedevery promotion through the modes ladder requires a passing replay digest.Documented threshold met for the full window.

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

RequirementStatus
Purpose defined✓ done
Required inputs listed✓ done
Parameters defined✓ done
Defaults defined✓ done
Warning thresholds defined✓ done
Hard thresholds defined✓ done
Safe fallback defined✓ done
Structured output defined✓ done
Developer log defined✓ done
Plain-English explanation✓ done
Unit tests defined✓ done
Integration tests defined✓ done
Property tests defined✓ done
Failure-mode block complete✓ done
Reference implementation pseudocode✓ done
Wire examples (input + output)✓ done
Reason codes listed✓ done
Metrics & logs defined✓ done
State & persistence defined✓ done
Concurrency & idempotency defined✓ done
Dependencies declared✓ done
Security surfaces declared✓ done
Polymarket V2 compatibility declared✓ done
Version & migration history declared✓ done
Operational runbook defined✓ done
Promotion gates defined✓ done
Failure-injection recipes defined✓ done