1. Bot Identity
| Layer | Governance Governance |
|---|
| Bot class | Governance Service |
|---|
| Authority | Explain |
|---|
| Status | BETA |
|---|
| Readiness | Limited live |
|---|
| Runs before | Nothing — Backtester is a governance simulation bot; it runs in replay mode against the report archive and never precedes live execution |
|---|
| Runs after | Historical report archive is populated; ReportEnvelope records for the target time window are available |
|---|
| Applies to | All strategy bots configured for replay; any historical OrderIntent window specified via start_ts/end_ts |
|---|
| Default mode | limited_live |
|---|
| User-visible | Advanced details only |
|---|
| Developer owner | Polytraders core — Governance pod |
|---|
2. Purpose
Backtester replays historical CLOB snapshots and the full report archive through the live execution path at tick resolution. It runs in replay mode (mode=replay), consuming archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes and re-executing the strategy under test against them. It emits replay-tagged OperationsReport records (and replay-tagged copies of every simulated report kind) to polytraders.reports.operations, partitioned by bot_slug+epoch, retained for 1 year. Backtester is used by the governance team to validate parameter changes, A/B test strategies, and produce audit-quality evidence before any strategy is promoted to live. Backtester never signs orders or touches the live CLOB.
3. Why This Bot Matters
Strategy promoted to live without backtesting
Untested parameter changes or new strategy logic may produce runaway losses or adverse fills on live capital. Backtesting is a mandatory gate before promotion.
Backtester uses different execution path than live
Results are not comparable to live performance. Governance audit evidence is invalid and cannot be used to justify promotion decisions.
Replay-tagged reports not emitted
Backtesting runs are not auditable. Governance cannot produce the required evidence trail for strategy promotion gates.
Parameter sweep runs with non-deterministic inputs
Results are not reproducible. Successive backtests on the same window may produce different outputs, making comparison and audit impossible.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|
| start_ts | now()-7d | None | None | Start timestamp (ISO 8601, UTC) for the replay window. |
| end_ts | now() | None | None | End timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts. |
| strategy | sum_to_one_arb | None | None | The strategy bot slug to replay through the backtester execution path. |
| parameter_sweep | [] | 50 | 200 | List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job. |
7. Detailed Parameter Instructions
start_ts
What it means
Start timestamp (ISO 8601, UTC) for the replay window.
Default
{ "start_ts": "now()-7d" }
Why this default matters
Defaulting to 7 days ago ensures at least one full trading week is covered by each backtest run.
Threshold logic
| Condition | Action |
|---|
| start_ts < end_ts | Accept window and begin replay |
| start_ts >= end_ts | Reject with BACKTESTER_INVALID_WINDOW |
Developer check
if (p.start_ts >= p.end_ts) throw ConfigError('BACKTESTER_INVALID_WINDOW')
User-facing English
The backtest covers historical data starting from this date and time.
end_ts
What it means
End timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts.
Default
{ "end_ts": "now()" }
Why this default matters
Defaulting to now() makes fresh backtests cover up to the most recent archived data.
Threshold logic
| Condition | Action |
|---|
| end_ts > start_ts AND end_ts <= now() | Accept window |
| end_ts > now() | Clamp to now(); emit BACKTESTER_WINDOW_CLAMPED warn |
Developer check
if (p.end_ts > now()) { p.end_ts = now(); log.warn('BACKTESTER_WINDOW_CLAMPED') }
User-facing English
The backtest covers historical data up to this date and time.
strategy
What it means
The strategy bot slug to replay through the backtester execution path.
Default
{ "strategy": "sum_to_one_arb" }
Why this default matters
Default to sum_to_one_arb as the most commonly backtested strategy. Must be explicitly set for other strategy types.
Threshold logic
| Condition | Action |
|---|
| strategy in registered strategy registry | Load strategy configuration and begin replay |
| strategy not in registry | Reject with BACKTESTER_UNKNOWN_STRATEGY |
Developer check
if (!strategyRegistry.has(p.strategy)) throw ConfigError('BACKTESTER_UNKNOWN_STRATEGY')
User-facing English
The trading strategy being evaluated in this backtest.
parameter_sweep
What it means
List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job.
Default
{ "parameter_sweep": [] }
Why this default matters
Empty list (single run) is safe as a default. Sweeps beyond 200 variants can exceed the archive query budget and should require explicit approval.
Threshold logic
| Condition | Action |
|---|
| len(parameter_sweep) == 0 | Run single backtest with strategy defaults |
| 1 <= len(parameter_sweep) <= 50 | Run all variants; emit summary OperationsReport per variant |
| 51 <= len(parameter_sweep) <= 200 | WARN BACKTESTER_LARGE_SWEEP; run all variants with resource throttle |
| len(parameter_sweep) > 200 | Reject with PARAMETER_CHANGE_REQUIRES_APPROVAL |
Developer check
if (p.parameter_sweep.length > p.hard) throw ConfigError('PARAMETER_CHANGE_REQUIRES_APPROVAL')
User-facing English
A list of parameter variations to test. Each entry produces a separate backtest result for comparison.
8. Default Configuration
{
"bot_id": "gov.backtester",
"version": "2.0.0",
"mode": "replay",
"defaults": {
"start_ts": "now()-7d",
"end_ts": "now()",
"strategy": "sum_to_one_arb",
"parameter_sweep": []
},
"locked": {
"mode": {
"immutable": true,
"value": "replay"
}
}
}
9. Implementation Flow
- On run start, validate start_ts < end_ts and strategy is in the registry; reject with BACKTESTER_INVALID_WINDOW or BACKTESTER_UNKNOWN_STRATEGY otherwise.
- Assign a replay_run_id (ULID) for this run; all emitted reports for this run carry replay_run_id and mode=replay.
- Fetch archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes for the window [start_ts, end_ts] from the report archive.
- For each tick in the replay window (tick = snapshot boundary from historical CLOB snapshots), reconstruct the order book state and available signals at that moment.
- Feed reconstructed state and archived signals into the strategy under test using the same execution path as live (same guardrails, same execution logic).
- For each simulated OrderIntent, run the full risk guardrail stack (replay mode — votes are simulated, not live); record each vote outcome.
- Simulate fill outcomes using the historical order book snapshot at the matching tick; compute simulated fill price, slippage, and fee using pUSD denomination.
- Emit replay-tagged OperationsReport after each simulated fill: includes replay_run_id, original_trace_id, simulated verdict, simulated P&L in pUSD, and comparison against original outcome.
- Emit replay-tagged copies of simulated DecisionReport and ExecutionReport envelopes with mode=replay and replay_run_id for archive and audit.
- At run completion, emit an aggregate OperationsReport summarising: total ticks replayed, simulated fills, simulated P&L, parameter variant results if sweep was configured, and comparison deltas vs original outcomes.
- Retain all replay-tagged reports in polytraders.reports.operations for 1 year, partitioned by bot_slug+epoch.
10. Reference Implementation
Loads historical CLOB snapshots and archived report envelopes for the replay window, reconstructs the execution path tick by tick, emits replay-tagged OperationsReport per simulated fill and an aggregate report at run end. All outputs carry mode=replay and replay_run_id.
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. Translate to TS/Python/Go/Rust.
// ---- STARTUP ----
FUNCTION initReplay(config):
IF config.start_ts >= config.end_ts:
RAISE ConfigError('BACKTESTER_INVALID_WINDOW')
IF NOT strategyRegistry.has(config.strategy):
RAISE ConfigError('BACKTESTER_UNKNOWN_STRATEGY')
IF config.end_ts > now():
config.end_ts = now()
alerting.emit('BACKTESTER_WINDOW_CLAMPED')
replay_run_id = generateULID() // e.g. 'replay_01HX9KZQ7E8VR5'
variants = config.parameter_sweep OR [config.strategy.defaults]
// ---- PER-VARIANT LOOP ----
FOR variant IN variants:
strategy = strategyRegistry.load(config.strategy, params=variant)
// Fetch archived signals for the full window
archive = FETCH internal.reportArchive.GET({
window: [config.start_ts, config.end_ts],
kinds: ['ObservationReport', 'DecisionReport', 'RiskVote',
'ExecutionReport', 'SettlementReport']
})
// Fetch archived CLOB snapshots (tick-level)
snapshots = FETCH internal.clobArchive.GET({
window: [config.start_ts, config.end_ts],
resolution: 'tick'
})
sim_fills = []
sim_pnl_pusd = 0.0
FOR tick IN snapshots:
// Reconstruct order book at this tick
book = reconstructBook(tick)
// Replay archived signals at this tick timestamp
signals = archive.signalsAt(tick.ts_ms)
// Run strategy in replay mode
intent = strategy.evaluate(book, signals, mode='replay')
IF intent IS NULL:
CONTINUE
// Simulate risk guardrail stack (replay mode)
votes = riskStack.simulate(intent, context={
book: book, signals: signals, mode: 'replay'
})
shaped_intent = applyVotes(intent, votes)
IF shaped_intent IS NULL:
EMIT OperationsReport(event_type='BACKTESTER_INTENT_REJECTED',
replay_run_id=replay_run_id, mode='replay', ...)
CONTINUE
// Simulate fill against historical book
sim_fill = simulateFill(shaped_intent, book)
sim_fill.size_pusd = toPusdUnits(sim_fill.size_usd)
sim_fill.fee_pusd = sim_fill.size_pusd * sim_fill.fee_bps / 10_000
sim_fill.pnl_pusd = computeRealisedPnL(sim_fill, costBasis='FIFO')
sim_fills.append(sim_fill)
sim_pnl_pusd += sim_fill.pnl_pusd
// Emit per-fill replay OperationsReport
EMIT OperationsReport({
report_id: 'ops_backtester_' + replay_run_id + '_' + tick.ts_ms,
bot_id: 'gov.backtester',
event_type: 'BACKTESTER_TICK_PROCESSED',
replay_run_id: replay_run_id,
mode: 'replay',
tick_ts_ms: tick.ts_ms,
market_id: shaped_intent.market_id,
simulated_fill: sim_fill,
risk_votes: votes,
topic: 'polytraders.reports.operations',
partition: 'backtester+' + epochBucket(tick.ts_ms)
})
// Aggregate summary report at run end
EMIT OperationsReport({
event_type: 'BACKTESTER_REPLAY_COMPLETE',
replay_run_id: replay_run_id,
strategy: config.strategy,
parameter_variant: variant,
mode: 'replay',
window_start: config.start_ts,
window_end: config.end_ts,
ticks_replayed: len(snapshots),
simulated_fills: len(sim_fills),
simulated_volume_pusd: SUM(f.size_pusd FOR f IN sim_fills),
simulated_pnl_pusd: sim_pnl_pusd,
simulated_net_fees_pusd: SUM(f.fee_pusd FOR f IN sim_fills),
topic: 'polytraders.reports.operations',
partition: 'backtester+' + epochBucket(config.end_ts),
retained_until: config.end_ts + 1y
})
SDK calls used
internal.reportArchive.GET({ window, kinds })internal.clobArchive.GET({ window, resolution: 'tick' })strategyRegistry.load(slug, params)riskStack.simulate(intent, context)toPusdUnits(raw_usd)alerting.emit('BACKTESTER_WINDOW_CLAMPED', metadata)
Complexity: O(T * V) where T = ticks in replay window, V = parameter_sweep variants
11. Wire Examples
Input — what arrives on the wire
{
"label": "Backtester run configuration",
"source": "config_store",
"payload": {
"bot_id": "gov.backtester",
"mode": "replay",
"strategy": "sum_to_one_arb",
"start_ts": "2026-05-01T00:00:00Z",
"end_ts": "2026-05-08T00:00:00Z",
"parameter_sweep": [
{
"min_edge_bps": 12,
"max_size_pusd": 1000
},
{
"min_edge_bps": 15,
"max_size_pusd": 1500
}
]
}
}
Output — what the bot emits
{
"label": "OperationsReport — BACKTESTER_REPLAY_COMPLETE (2-variant sweep)",
"payload": {
"report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
"bot_id": "gov.backtester",
"event_type": "BACKTESTER_REPLAY_COMPLETE",
"replay_run_id": "replay_01HX9KZQ7E8VR5",
"strategy": "sum_to_one_arb",
"mode": "replay",
"window_start": "2026-05-01T00:00:00Z",
"window_end": "2026-05-08T00:00:00Z",
"ticks_replayed": 483120,
"parameter_variants_run": 2,
"variants": [
{
"variant_id": 0,
"params": {
"min_edge_bps": 12,
"max_size_pusd": 1000
},
"simulated_fills": 214,
"simulated_volume_pusd": 92450.0,
"simulated_pnl_pusd": 1380.5,
"simulated_net_fees_pusd": 231.1
},
{
"variant_id": 1,
"params": {
"min_edge_bps": 15,
"max_size_pusd": 1500
},
"simulated_fills": 178,
"simulated_volume_pusd": 104200.0,
"simulated_pnl_pusd": 1890.2,
"simulated_net_fees_pusd": 260.5
}
],
"report_kind": "OperationsReport",
"topic": "polytraders.reports.operations",
"partition": "backtester+2026-05-08T00:00Z",
"retained_until": "2027-05-08"
}
}
12. Decision Logic
APPROVE
Not applicable — Backtester is a simulation bot in replay mode. It never approves live orders.
RESHAPE_REQUIRED
Not applicable.
REJECT
Not applicable as a live trading decision. Backtester will abort a replay run if the archive data is incomplete or the strategy is unknown.
WARNING_ONLY
Parameter sweeps above 50 variants emit BACKTESTER_LARGE_SWEEP. Replay windows that extend beyond now() are clamped with BACKTESTER_WINDOW_CLAMPED warn.
13. Standard Decision Output
This bot returns a OperationsReport object. See OperationsReport schema.
{
"report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
"bot_id": "gov.backtester",
"event_type": "BACKTESTER_REPLAY_COMPLETE",
"replay_run_id": "replay_01HX9KZQ7E8VR5",
"strategy": "sum_to_one_arb",
"mode": "replay",
"window_start": "2026-05-01T00:00:00Z",
"window_end": "2026-05-08T00:00:00Z",
"ticks_replayed": 483120,
"simulated_fills": 214,
"simulated_volume_pusd": 92450.0,
"simulated_pnl_pusd": 1380.5,
"simulated_net_fees_pusd": 231.1,
"vs_original_pnl_delta_pusd": 42.0,
"parameter_variants_run": 1,
"report_kind": "OperationsReport",
"topic": "polytraders.reports.operations",
"partition": "backtester+2026-05-08T00:00Z",
"retained_until": "2027-05-08"
}
14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|
BACKTESTER_REPLAY_COMPLETE | INFO | A replay run completed successfully; aggregate OperationsReport emitted. | No action — routine completion. | The backtest finished. Results are available in the governance audit report. |
BACKTESTER_TICK_PROCESSED | INFO | A single tick was processed during replay; per-tick OperationsReport emitted. | No action — operational heartbeat. | |
BACKTESTER_INVALID_WINDOW | HARD_REJECT | start_ts >= end_ts; the replay window is invalid. | Reject the run configuration; emit alert. | The backtest start date must be before the end date. |
BACKTESTER_UNKNOWN_STRATEGY | HARD_REJECT | The strategy slug specified in the config is not in the strategy registry. | Reject the run; emit alert. | The requested strategy is not available for backtesting. |
BACKTESTER_WINDOW_CLAMPED | WARN | end_ts was in the future; clamped to now(). | Clamp end_ts; emit WARN; continue replay. | The backtest end date was set to the most recent available data. |
BACKTESTER_TICK_SKIPPED | WARN | A tick was missing from the archive; the replay skipped that tick. | Log WARN; skip tick; continue from next available tick. | |
BACKTESTER_ARCHIVE_UNAVAILABLE | HARD_REJECT | The report archive is entirely unavailable; the replay cannot start or has stalled. | Abort the run; emit alert. | |
BACKTESTER_LARGE_SWEEP | WARN | Parameter sweep has more than 50 variants; resource throttle applied. | Emit WARN; apply run throttle; continue sweep. | |
KILL_SWITCH_ACTIVE | WARN | KillSwitch is active; new backtesting run launches are suppressed. | Suppress new run; emit WARN; in-progress runs continue. | New backtests are paused while the system kill switch is active. |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|
polytraders_gov_backtester_runs_total | counter | count | strategy, status | Total backtester runs launched, labelled by strategy slug and completion status (complete/aborted). |
polytraders_gov_backtester_ticks_processed_total | counter | count | strategy, replay_run_id | Total ticks processed across all replay runs. |
polytraders_gov_backtester_simulated_fills_total | counter | count | strategy | Total simulated fills produced across all runs. |
polytraders_gov_backtester_simulated_pnl_pusd | gauge | usd | strategy, replay_run_id | Simulated P&L in pUSD for the most recent completed replay run per strategy. |
polytraders_gov_backtester_ticks_skipped_total | counter | count | strategy | Ticks skipped due to archive gaps. Should remain near zero. |
polytraders_gov_backtester_run_duration_ms | histogram | ms | strategy | Wall-clock duration of a complete backtest run. |
Alerts
| Alert | Condition | Severity | Runbook |
|---|
BacktesterArchiveUnavailable | rate(polytraders_gov_backtester_runs_total{status='aborted'}[15m]) > 0 | page | #runbook-backtester-archive-unavailable |
BacktesterHighTickSkipRate | rate(polytraders_gov_backtester_ticks_skipped_total[10m]) / rate(polytraders_gov_backtester_ticks_processed_total[10m]) > 0.01 | warn | #runbook-backtester-tick-skip |
BacktesterNoRunsIn24h | rate(polytraders_gov_backtester_runs_total[24h]) == 0 | warn | #runbook-backtester-no-runs |
BacktesterRunDurationHigh | histogram_quantile(0.99, polytraders_gov_backtester_run_duration_ms) > 300000 | warn | #runbook-backtester-slow-run |
Dashboards
- Grafana — Governance / Backtester run history and simulated P&L by strategy
- Grafana — Governance / Parameter sweep results comparison
16. Developer Reporting
{
"bot_id": "gov.backtester",
"event_type": "BACKTESTER_TICK_PROCESSED",
"replay_run_id": "replay_01HX9KZQ7E8VR5",
"tick_ts_ms": 1746792060000,
"market_id": "0x9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c",
"simulated_intent_emitted": true,
"simulated_fill_price": 0.621,
"simulated_fill_size_pusd": 430.0,
"simulated_slippage_bps": 0.8,
"risk_votes": [
{
"bot": "liquidityguard",
"vote": "pass"
},
{
"bot": "portfolioguard",
"vote": "reshape",
"new_size_pusd": 430.0
}
],
"mode": "replay"
}
17. Plain-English Reporting
| Situation | User-facing explanation |
|---|
| Backtest run completed successfully | The strategy was replayed over the selected historical period. Results include simulated trades, P&L estimates, and a comparison against what actually happened. |
| Parameter sweep completed | Multiple parameter variants were tested over the same historical data. A summary of results for each variant is available. |
| Replay window clamped to present | The requested end date is in the future. The backtest was run up to the most recent available data. |
| Archive data incomplete for window | Some historical data is unavailable for the requested window. The backtest covered only the periods with complete records. |
18. Failure-Mode Block
| main_failure_mode | The report archive is unavailable or has gaps for the requested replay window, causing the backtester to skip ticks and produce incomplete simulated results. |
|---|
| false_positive_risk | Simulated fills are computed against a historical order book snapshot that may not reflect the true fill price due to latency artifacts in the archive, inflating apparent P&L. |
|---|
| false_negative_risk | A strategy change that only manifests at rare market conditions (e.g. extreme spread) may not be detected if the replay window does not include those events. |
|---|
| safe_fallback | If archive data is incomplete for a tick, skip that tick and emit BACKTESTER_TICK_SKIPPED warn; continue replay from the next available tick. Never extrapolate or fill gaps synthetically. If the archive is entirely unavailable, abort the run and emit BACKTESTER_ARCHIVE_UNAVAILABLE. |
|---|
| required_dependencies | Historical report archive (ObservationReport, DecisionReport, RiskVote, ExecutionReport, SettlementReport), Historical CLOB snapshots (order book + trades), Strategy registry (strategy configuration under test), Internal message bus (OperationsReport emission) |
|---|
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|
ARCHIVE_UNAVAILABLE | Block all reads from internal.reportArchive and internal.clobArchive | BACKTESTER_ARCHIVE_UNAVAILABLE raised; run aborted; OperationsReport with status=aborted emitted | Once archive is reachable, launch a new replay run. No partial state needs clearing. |
TICK_GAPS_IN_ARCHIVE | Delete 5% of tick records from clobArchive for the target window | BACKTESTER_TICK_SKIPPED emitted per missing tick; replay continues; aggregate report notes skipped ticks | Automatic; skipped ticks are noted in the aggregate report. |
INVALID_WINDOW | Submit run with start_ts > end_ts | BACKTESTER_INVALID_WINDOW raised; run rejected immediately | Correct the window configuration and resubmit. |
LARGE_PARAMETER_SWEEP | Submit parameter_sweep with 201 variants | PARAMETER_CHANGE_REQUIRES_APPROVAL raised; run rejected | Reduce sweep size to <= 200 or request approval. |
KILL_SWITCH_DURING_LAUNCH | Activate KillSwitch; submit a new backtester run | KILL_SWITCH_ACTIVE logged; new run not started; in-progress runs continue to completion | Deactivate KillSwitch; resubmit run. |
STRATEGY_NOT_IN_REGISTRY | Submit run with strategy='unknown_strategy' | BACKTESTER_UNKNOWN_STRATEGY raised; run rejected | Use a registered strategy slug. |
20. State & Persistence
Cold-start recovery
On restart, any in-progress replay run is marked as interrupted. A new run must be launched manually. No partial replay state is held in memory.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|
| Execution model | single-threaded event loop per replay run; multiple runs may execute in parallel up to max_in_flight |
| Max in-flight | 5 |
| Idempotency key | replay_run_id |
| Per-call timeout (ms) | 300000 |
| Backpressure strategy | reject new run launches beyond max_in_flight; emit BACKTESTER_RUN_QUEUE_FULL warn |
| Locking / mutual exclusion | Postgres unique constraint on replay_run_id; in-run state is per-goroutine |
22. Dependencies
Depends on (must run first)
| Bot | Why | Contract |
|---|
internal.report_archive | All historical ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes are fetched from the report archive for replay reconstruction. | |
internal.clob_archive | Tick-level CLOB snapshots (order book + trades) are the primary replay data source for simulated fill computation. | |
Emits to (downstream consumers)
| Bot | Why | Contract |
|---|
internal.governance_audit | | |
Sibling bots (same OrderIntent)
Used by (auto-aggregated)
6.8
External services
| Service | Endpoint | SLA assumed | On failure |
|---|
| Internal report archive | | 99.9% (internal SRE target) | |
| Internal CLOB snapshot archive | | 99.9% (internal SRE target) | |
23. Security Surfaces
Abuse vectors considered
- Submitting a large parameter_sweep (>200 variants) to exhaust archive query budget
- Injecting a crafted replay window that targets gaps in the archive to produce misleading P&L results
- Manipulating strategy registry to replay a strategy not approved for backtesting
Mitigations
- parameter_sweep hard limit of 200 enforced at config load; excess rejected with PARAMETER_CHANGE_REQUIRES_APPROVAL
- Backtester is locked to mode=replay; live CLOB and onchain surfaces are never accessed
- Strategy registry is read-only within the backtester; only registered strategies can be loaded
- All replay runs are audit-logged with replay_run_id and emitted to the governance report bus
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|
| CLOB version | v2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | no |
| Aware of negative-risk markets | yes |
| Multi-chain ready | no |
| SDK used | py-clob-client-v2 |
| Settlement contract | CTFExchangeV2 on Polygon |
| Notes | Backtester consumes archived V2 ReportEnvelope records; all simulated fills are denominated in pUSD. Backtester is replay-only and never makes live CLOB calls. NegRisk market payoffs are replayed using the NegRiskAdapter path stored in the archived SettlementReport. |
API surfaces declared
internal
Networks supported
polygon
25. Versioning & Migration
| Field | Value |
|---|
| spec | 2.0.0 |
| implementation | 2.1.0 |
| schema | 2 |
| released | 2026-04-28 |
Migration history
| Date | From | To | Reason | Action taken |
|---|
| 2026-04-28 | v1 | v2 | CLOB V2 cutover | Updated replay pipeline to consume V2 ReportEnvelope format (pUSD denomination, mode field, replay_run_id). Removed USDC.e references from simulated fill output. Replay now ingests archived OperationsReport, DecisionReport, ExecutionReport, and SettlementReport in V2 schema. All simulated fill sizes and P&L now denominated in pUSD. Switched to py-clob-client-v2 archive reader. |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|
| Replay window validation rejects start_ts >= end_ts | start_ts='2026-05-09', end_ts='2026-05-01' | BACKTESTER_INVALID_WINDOW ConfigError |
| Unknown strategy slug raises error | strategy='nonexistent_strategy' | BACKTESTER_UNKNOWN_STRATEGY ConfigError |
| Parameter sweep above hard limit (200) is rejected | parameter_sweep = list of 201 variants | PARAMETER_CHANGE_REQUIRES_APPROVAL ConfigError |
| Replay assigns mode=replay on all emitted reports | Run single-variant replay; inspect all emitted OperationsReport payloads | All reports carry mode='replay' and replay_run_id |
| Simulated fill denominated in pUSD (not USDC.e) | Replay a fill event from historical archive | simulated_fill output contains size_pusd field; no USDC.e references |
| Archive gap causes tick skip, not crash | Inject missing tick in archive at t=T; continue replay | BACKTESTER_TICK_SKIPPED emitted; replay continues from T+1 |
Integration Tests
| Test | Expected result |
|---|
| End-to-end: 7-day replay of sum_to_one_arb produces aggregate OperationsReport | Aggregate OperationsReport with ticks_replayed > 0, simulated_fills >= 0, mode=replay, retained_until = now+1y |
| Parameter sweep of 3 variants produces 3 separate OperationsReport summaries | 3 reports, each with distinct parameter_variant field and independent simulated P&L |
| KillSwitch active suppresses new run launches | KILL_SWITCH_ACTIVE logged; backtest run not started; in-progress runs continue |
Property Tests
| Property | Required behaviour |
|---|
| All emitted reports carry mode=replay and replay_run_id | Always true — replay mode is locked immutable in default_config |
| No live CLOB calls are made during replay | Always true — all inputs come from the archive; no clob_auth or onchain surfaces are accessed |
| Replay is deterministic: same inputs produce same outputs | Always true — no randomness; archive inputs are immutable |
27. Operational Runbook
Backtester incidents involve archive unavailability (run aborts), high tick-skip rates (data quality issue), or runs stuck in progress (goroutine leak or resource exhaustion). All incidents are low-urgency unless they block a promotion gate decision.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|
BacktesterArchiveUnavailable | Check internal report archive and CLOB archive health. Verify storage backend is reachable. | | | Governance pod lead if archive is down for > 30 minutes |
BacktesterHighTickSkipRate | Check clobArchive completeness for the affected window. May indicate a historical data ingestion failure. | | | Governance pod lead; data engineering if archive ingestion is the root cause |
BacktesterNoRunsIn24h | Check if any promotion-gate backtests are pending. If yes, investigate whether run launches are being suppressed by KillSwitch. | | | Governance pod lead |
BacktesterRunDurationHigh | Check archive query latency and concurrent run count. Reduce max_in_flight if resource contention is the cause. | | | SRE on-call if duration > 10 minutes consistently |
Manual overrides
polytraders gov backtest abort --replay-run-id <id> — Aborts a stuck replay run; use after investigating root cause. Originally: A replay run is stuck in progress and blocking the run queue.
Healthcheck
/internal/health/backtester → 200 Last run completed < 24h ago; archive reachable; max_in_flight not saturated; ticks_skipped_rate < 0.01.; red if Archive unreachable; all runs aborting; run queue saturated for > 5 minutes.
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |