Home › By Layer › Governance › 6.4 Backtester

6.4 Backtester

Governance Governance Service Explain BETA Limited live capital · Indirect P7 · Governance & replay ○ pending stub

Backtester replays historical CLOB snapshots and the full report archive through the live execution path at tick resolution. It runs in replay mode (mode=replay), consuming archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes and re-executing the strategy under test against them. It emits replay-tagged OperationsReport records (and replay-tagged copies of every simulated report kind) to polytraders.reports.operations, partitioned by bot_slug+epoch, retained for 1 year. Backtester is used by the governance team to validate parameter changes, A/B test strategies, and produce audit-quality evidence before any strategy is promoted to live. Backtester never signs orders or touches the live CLOB.

v3 readiness

Docs27/27

donehow scored

Impl0/15

pendinghow scored

Backtest0/4

pendinghow scored

Runtime0/8

pendinghow scored

A bot is done when all four scores are. What does done mean?

← 6.3 PnL Reporter 6.5 Paper-Trade Runner →

1. Bot Identity

Layer	Governance Governance
Bot class	Governance Service
Authority	Explain
Status	BETA
Readiness	Limited live
Runs before	Nothing — Backtester is a governance simulation bot; it runs in replay mode against the report archive and never precedes live execution
Runs after	Historical report archive is populated; ReportEnvelope records for the target time window are available
Applies to	All strategy bots configured for replay; any historical OrderIntent window specified via start_ts/end_ts
Default mode	`limited_live`
User-visible	Advanced details only
Developer owner	Polytraders core — Governance pod

2. Purpose

3. Why This Bot Matters

Strategy promoted to live without backtesting
Untested parameter changes or new strategy logic may produce runaway losses or adverse fills on live capital. Backtesting is a mandatory gate before promotion.
Backtester uses different execution path than live
Results are not comparable to live performance. Governance audit evidence is invalid and cannot be used to justify promotion decisions.
Replay-tagged reports not emitted
Backtesting runs are not auditable. Governance cannot produce the required evidence trail for strategy promotion gates.
Parameter sweep runs with non-deterministic inputs
Results are not reproducible. Successive backtests on the same window may produce different outputs, making comparison and audit impossible.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

Input	Source	Required?	Use
Historical CLOB snapshots (order book + trades, tick-level)	`internal`	Yes	Primary replay data source. Backtester reconstructs the order book state at each tick from the archived snapshot stream.
Archived ObservationReport envelopes	`internal`	Yes	Replay pre-trade intelligence signals as they existed at the original observation time.
Archived DecisionReport envelopes	`internal`	Yes	Reconstruct the original strategy decisions for comparison against replayed decisions.
Archived RiskVote envelopes	`internal`	Yes	Reconstruct risk guardrail votes that were in effect during the replay window.
Archived ExecutionReport envelopes	`internal`	No	Compare simulated fill outcomes against actual historical fills for slippage and fill-quality analysis.
Archived SettlementReport envelopes	`internal`	No	Replay post-trade P&L settlement to reconstruct the full governance evidence trail.

5. Required Internal Inputs

Input	Source	Required?	Use
Strategy configuration under test	`Config store`	Yes	Parameters (or parameter sweep grid) for the strategy being backtested.
KillSwitch active flag	`KillSwitch`	No	When KillSwitch is active, suppress new backtesting run launches. Runs already in progress may continue.

6. Parameter Guide

Parameter	Default	Warning	Hard	What it controls
start_ts	`now()-7d`	`None`	`None`	Start timestamp (ISO 8601, UTC) for the replay window.
end_ts	`now()`	`None`	`None`	End timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts.
strategy	`sum_to_one_arb`	`None`	`None`	The strategy bot slug to replay through the backtester execution path.
parameter_sweep	`[]`	`50`	`200`	List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job.

7. Detailed Parameter Instructions

start_ts

What it means

Start timestamp (ISO 8601, UTC) for the replay window.

Default

{ "start_ts": "now()-7d" }

Why this default matters

Defaulting to 7 days ago ensures at least one full trading week is covered by each backtest run.

Threshold logic

Condition	Action
start_ts < end_ts	Accept window and begin replay
start_ts >= end_ts	Reject with BACKTESTER_INVALID_WINDOW

Developer check

if (p.start_ts >= p.end_ts) throw ConfigError('BACKTESTER_INVALID_WINDOW')

User-facing English

The backtest covers historical data starting from this date and time.

end_ts

What it means

End timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts.

Default

{ "end_ts": "now()" }

Why this default matters

Defaulting to now() makes fresh backtests cover up to the most recent archived data.

Threshold logic

Condition	Action
end_ts > start_ts AND end_ts <= now()	Accept window
end_ts > now()	Clamp to now(); emit BACKTESTER_WINDOW_CLAMPED warn

Developer check

if (p.end_ts > now()) { p.end_ts = now(); log.warn('BACKTESTER_WINDOW_CLAMPED') }

User-facing English

The backtest covers historical data up to this date and time.

strategy

What it means

The strategy bot slug to replay through the backtester execution path.

Default

{ "strategy": "sum_to_one_arb" }

Why this default matters

Default to sum_to_one_arb as the most commonly backtested strategy. Must be explicitly set for other strategy types.

Threshold logic

Condition	Action
strategy in registered strategy registry	Load strategy configuration and begin replay
strategy not in registry	Reject with BACKTESTER_UNKNOWN_STRATEGY

Developer check

if (!strategyRegistry.has(p.strategy)) throw ConfigError('BACKTESTER_UNKNOWN_STRATEGY')

User-facing English

The trading strategy being evaluated in this backtest.

parameter_sweep

What it means

List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job.

Default

{ "parameter_sweep": [] }

Why this default matters

Empty list (single run) is safe as a default. Sweeps beyond 200 variants can exceed the archive query budget and should require explicit approval.

Threshold logic

Condition	Action
len(parameter_sweep) == 0	Run single backtest with strategy defaults
1 <= len(parameter_sweep) <= 50	Run all variants; emit summary OperationsReport per variant
51 <= len(parameter_sweep) <= 200	WARN BACKTESTER_LARGE_SWEEP; run all variants with resource throttle
len(parameter_sweep) > 200	Reject with PARAMETER_CHANGE_REQUIRES_APPROVAL

Developer check

if (p.parameter_sweep.length > p.hard) throw ConfigError('PARAMETER_CHANGE_REQUIRES_APPROVAL')

User-facing English

A list of parameter variations to test. Each entry produces a separate backtest result for comparison.

8. Default Configuration

{
  "bot_id": "gov.backtester",
  "version": "2.0.0",
  "mode": "replay",
  "defaults": {
    "start_ts": "now()-7d",
    "end_ts": "now()",
    "strategy": "sum_to_one_arb",
    "parameter_sweep": []
  },
  "locked": {
    "mode": {
      "immutable": true,
      "value": "replay"
    }
  }
}

9. Implementation Flow

On run start, validate start_ts < end_ts and strategy is in the registry; reject with BACKTESTER_INVALID_WINDOW or BACKTESTER_UNKNOWN_STRATEGY otherwise.
Assign a replay_run_id (ULID) for this run; all emitted reports for this run carry replay_run_id and mode=replay.
Fetch archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes for the window [start_ts, end_ts] from the report archive.
For each tick in the replay window (tick = snapshot boundary from historical CLOB snapshots), reconstruct the order book state and available signals at that moment.
Feed reconstructed state and archived signals into the strategy under test using the same execution path as live (same guardrails, same execution logic).
For each simulated OrderIntent, run the full risk guardrail stack (replay mode — votes are simulated, not live); record each vote outcome.
Simulate fill outcomes using the historical order book snapshot at the matching tick; compute simulated fill price, slippage, and fee using pUSD denomination.
Emit replay-tagged OperationsReport after each simulated fill: includes replay_run_id, original_trace_id, simulated verdict, simulated P&L in pUSD, and comparison against original outcome.
Emit replay-tagged copies of simulated DecisionReport and ExecutionReport envelopes with mode=replay and replay_run_id for archive and audit.
At run completion, emit an aggregate OperationsReport summarising: total ticks replayed, simulated fills, simulated P&L, parameter variant results if sweep was configured, and comparison deltas vs original outcomes.
Retain all replay-tagged reports in polytraders.reports.operations for 1 year, partitioned by bot_slug+epoch.

10. Reference Implementation

Loads historical CLOB snapshots and archived report envelopes for the replay window, reconstructs the execution path tick by tick, emits replay-tagged OperationsReport per simulated fill and an aggregate report at run end. All outputs carry mode=replay and replay_run_id.

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. Translate to TS/Python/Go/Rust.

// ---- STARTUP ----
FUNCTION initReplay(config):
  IF config.start_ts >= config.end_ts:
    RAISE ConfigError('BACKTESTER_INVALID_WINDOW')
  IF NOT strategyRegistry.has(config.strategy):
    RAISE ConfigError('BACKTESTER_UNKNOWN_STRATEGY')
  IF config.end_ts > now():
    config.end_ts = now()
    alerting.emit('BACKTESTER_WINDOW_CLAMPED')
  replay_run_id = generateULID()  // e.g. 'replay_01HX9KZQ7E8VR5'
  variants = config.parameter_sweep OR [config.strategy.defaults]

// ---- PER-VARIANT LOOP ----
FOR variant IN variants:
  strategy = strategyRegistry.load(config.strategy, params=variant)

  // Fetch archived signals for the full window
  archive = FETCH internal.reportArchive.GET({
    window:   [config.start_ts, config.end_ts],
    kinds:    ['ObservationReport', 'DecisionReport', 'RiskVote',
               'ExecutionReport', 'SettlementReport']
  })

  // Fetch archived CLOB snapshots (tick-level)
  snapshots = FETCH internal.clobArchive.GET({
    window:   [config.start_ts, config.end_ts],
    resolution: 'tick'
  })

  sim_fills = []
  sim_pnl_pusd = 0.0

  FOR tick IN snapshots:
    // Reconstruct order book at this tick
    book = reconstructBook(tick)

    // Replay archived signals at this tick timestamp
    signals = archive.signalsAt(tick.ts_ms)

    // Run strategy in replay mode
    intent = strategy.evaluate(book, signals, mode='replay')
    IF intent IS NULL:
      CONTINUE

    // Simulate risk guardrail stack (replay mode)
    votes = riskStack.simulate(intent, context={
      book: book, signals: signals, mode: 'replay'
    })
    shaped_intent = applyVotes(intent, votes)
    IF shaped_intent IS NULL:
      EMIT OperationsReport(event_type='BACKTESTER_INTENT_REJECTED',
                            replay_run_id=replay_run_id, mode='replay', ...)
      CONTINUE

    // Simulate fill against historical book
    sim_fill = simulateFill(shaped_intent, book)
    sim_fill.size_pusd   = toPusdUnits(sim_fill.size_usd)
    sim_fill.fee_pusd    = sim_fill.size_pusd * sim_fill.fee_bps / 10_000
    sim_fill.pnl_pusd    = computeRealisedPnL(sim_fill, costBasis='FIFO')
    sim_fills.append(sim_fill)
    sim_pnl_pusd += sim_fill.pnl_pusd

    // Emit per-fill replay OperationsReport
    EMIT OperationsReport({
      report_id:       'ops_backtester_' + replay_run_id + '_' + tick.ts_ms,
      bot_id:          'gov.backtester',
      event_type:      'BACKTESTER_TICK_PROCESSED',
      replay_run_id:   replay_run_id,
      mode:            'replay',
      tick_ts_ms:      tick.ts_ms,
      market_id:       shaped_intent.market_id,
      simulated_fill:  sim_fill,
      risk_votes:      votes,
      topic:           'polytraders.reports.operations',
      partition:       'backtester+' + epochBucket(tick.ts_ms)
    })

  // Aggregate summary report at run end
  EMIT OperationsReport({
    event_type:              'BACKTESTER_REPLAY_COMPLETE',
    replay_run_id:           replay_run_id,
    strategy:                config.strategy,
    parameter_variant:       variant,
    mode:                    'replay',
    window_start:            config.start_ts,
    window_end:              config.end_ts,
    ticks_replayed:          len(snapshots),
    simulated_fills:         len(sim_fills),
    simulated_volume_pusd:   SUM(f.size_pusd FOR f IN sim_fills),
    simulated_pnl_pusd:      sim_pnl_pusd,
    simulated_net_fees_pusd: SUM(f.fee_pusd FOR f IN sim_fills),
    topic:                   'polytraders.reports.operations',
    partition:               'backtester+' + epochBucket(config.end_ts),
    retained_until:          config.end_ts + 1y
  })

SDK calls used

internal.reportArchive.GET({ window, kinds })
internal.clobArchive.GET({ window, resolution: 'tick' })
strategyRegistry.load(slug, params)
riskStack.simulate(intent, context)
toPusdUnits(raw_usd)
alerting.emit('BACKTESTER_WINDOW_CLAMPED', metadata)

Complexity: O(T * V) where T = ticks in replay window, V = parameter_sweep variants

11. Wire Examples

Input — what arrives on the wire

{
  "label": "Backtester run configuration",
  "source": "config_store",
  "payload": {
    "bot_id": "gov.backtester",
    "mode": "replay",
    "strategy": "sum_to_one_arb",
    "start_ts": "2026-05-01T00:00:00Z",
    "end_ts": "2026-05-08T00:00:00Z",
    "parameter_sweep": [
      {
        "min_edge_bps": 12,
        "max_size_pusd": 1000
      },
      {
        "min_edge_bps": 15,
        "max_size_pusd": 1500
      }
    ]
  }
}

Output — what the bot emits

{
  "label": "OperationsReport — BACKTESTER_REPLAY_COMPLETE (2-variant sweep)",
  "payload": {
    "report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
    "bot_id": "gov.backtester",
    "event_type": "BACKTESTER_REPLAY_COMPLETE",
    "replay_run_id": "replay_01HX9KZQ7E8VR5",
    "strategy": "sum_to_one_arb",
    "mode": "replay",
    "window_start": "2026-05-01T00:00:00Z",
    "window_end": "2026-05-08T00:00:00Z",
    "ticks_replayed": 483120,
    "parameter_variants_run": 2,
    "variants": [
      {
        "variant_id": 0,
        "params": {
          "min_edge_bps": 12,
          "max_size_pusd": 1000
        },
        "simulated_fills": 214,
        "simulated_volume_pusd": 92450.0,
        "simulated_pnl_pusd": 1380.5,
        "simulated_net_fees_pusd": 231.1
      },
      {
        "variant_id": 1,
        "params": {
          "min_edge_bps": 15,
          "max_size_pusd": 1500
        },
        "simulated_fills": 178,
        "simulated_volume_pusd": 104200.0,
        "simulated_pnl_pusd": 1890.2,
        "simulated_net_fees_pusd": 260.5
      }
    ],
    "report_kind": "OperationsReport",
    "topic": "polytraders.reports.operations",
    "partition": "backtester+2026-05-08T00:00Z",
    "retained_until": "2027-05-08"
  }
}

12. Decision Logic

APPROVE

Not applicable — Backtester is a simulation bot in replay mode. It never approves live orders.

RESHAPE_REQUIRED

Not applicable.

REJECT

Not applicable as a live trading decision. Backtester will abort a replay run if the archive data is incomplete or the strategy is unknown.

WARNING_ONLY

Parameter sweeps above 50 variants emit BACKTESTER_LARGE_SWEEP. Replay windows that extend beyond now() are clamped with BACKTESTER_WINDOW_CLAMPED warn.

13. Standard Decision Output

This bot returns a OperationsReport object. See OperationsReport schema.

{
  "report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
  "bot_id": "gov.backtester",
  "event_type": "BACKTESTER_REPLAY_COMPLETE",
  "replay_run_id": "replay_01HX9KZQ7E8VR5",
  "strategy": "sum_to_one_arb",
  "mode": "replay",
  "window_start": "2026-05-01T00:00:00Z",
  "window_end": "2026-05-08T00:00:00Z",
  "ticks_replayed": 483120,
  "simulated_fills": 214,
  "simulated_volume_pusd": 92450.0,
  "simulated_pnl_pusd": 1380.5,
  "simulated_net_fees_pusd": 231.1,
  "vs_original_pnl_delta_pusd": 42.0,
  "parameter_variants_run": 1,
  "report_kind": "OperationsReport",
  "topic": "polytraders.reports.operations",
  "partition": "backtester+2026-05-08T00:00Z",
  "retained_until": "2027-05-08"
}

14. Reason Codes

Code	Severity	Meaning	Action	User-facing message
`BACKTESTER_REPLAY_COMPLETE`	INFO	A replay run completed successfully; aggregate OperationsReport emitted.	No action — routine completion.	The backtest finished. Results are available in the governance audit report.
`BACKTESTER_TICK_PROCESSED`	INFO	A single tick was processed during replay; per-tick OperationsReport emitted.	No action — operational heartbeat.
`BACKTESTER_INVALID_WINDOW`	HARD_REJECT	start_ts >= end_ts; the replay window is invalid.	Reject the run configuration; emit alert.	The backtest start date must be before the end date.
`BACKTESTER_UNKNOWN_STRATEGY`	HARD_REJECT	The strategy slug specified in the config is not in the strategy registry.	Reject the run; emit alert.	The requested strategy is not available for backtesting.
`BACKTESTER_WINDOW_CLAMPED`	WARN	end_ts was in the future; clamped to now().	Clamp end_ts; emit WARN; continue replay.	The backtest end date was set to the most recent available data.
`BACKTESTER_TICK_SKIPPED`	WARN	A tick was missing from the archive; the replay skipped that tick.	Log WARN; skip tick; continue from next available tick.
`BACKTESTER_ARCHIVE_UNAVAILABLE`	HARD_REJECT	The report archive is entirely unavailable; the replay cannot start or has stalled.	Abort the run; emit alert.
`BACKTESTER_LARGE_SWEEP`	WARN	Parameter sweep has more than 50 variants; resource throttle applied.	Emit WARN; apply run throttle; continue sweep.
`KILL_SWITCH_ACTIVE`	WARN	KillSwitch is active; new backtesting run launches are suppressed.	Suppress new run; emit WARN; in-progress runs continue.	New backtests are paused while the system kill switch is active.

15. Metrics & Logs

Metrics emitted

Metric	Type	Unit	Labels	Meaning
`polytraders_gov_backtester_runs_total`	counter	count	strategy, status	Total backtester runs launched, labelled by strategy slug and completion status (complete/aborted).
`polytraders_gov_backtester_ticks_processed_total`	counter	count	strategy, replay_run_id	Total ticks processed across all replay runs.
`polytraders_gov_backtester_simulated_fills_total`	counter	count	strategy	Total simulated fills produced across all runs.
`polytraders_gov_backtester_simulated_pnl_pusd`	gauge	usd	strategy, replay_run_id	Simulated P&L in pUSD for the most recent completed replay run per strategy.
`polytraders_gov_backtester_ticks_skipped_total`	counter	count	strategy	Ticks skipped due to archive gaps. Should remain near zero.
`polytraders_gov_backtester_run_duration_ms`	histogram	ms	strategy	Wall-clock duration of a complete backtest run.

Alerts

Alert	Condition	Severity	Runbook
`BacktesterArchiveUnavailable`	`rate(polytraders_gov_backtester_runs_total{status='aborted'}[15m]) > 0`	page	#runbook-backtester-archive-unavailable
`BacktesterHighTickSkipRate`	`rate(polytraders_gov_backtester_ticks_skipped_total[10m]) / rate(polytraders_gov_backtester_ticks_processed_total[10m]) > 0.01`	warn	#runbook-backtester-tick-skip
`BacktesterNoRunsIn24h`	`rate(polytraders_gov_backtester_runs_total[24h]) == 0`	warn	#runbook-backtester-no-runs
`BacktesterRunDurationHigh`	`histogram_quantile(0.99, polytraders_gov_backtester_run_duration_ms) > 300000`	warn	#runbook-backtester-slow-run

Dashboards

Grafana — Governance / Backtester run history and simulated P&L by strategy
Grafana — Governance / Parameter sweep results comparison

16. Developer Reporting

{
  "bot_id": "gov.backtester",
  "event_type": "BACKTESTER_TICK_PROCESSED",
  "replay_run_id": "replay_01HX9KZQ7E8VR5",
  "tick_ts_ms": 1746792060000,
  "market_id": "0x9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c",
  "simulated_intent_emitted": true,
  "simulated_fill_price": 0.621,
  "simulated_fill_size_pusd": 430.0,
  "simulated_slippage_bps": 0.8,
  "risk_votes": [
    {
      "bot": "liquidityguard",
      "vote": "pass"
    },
    {
      "bot": "portfolioguard",
      "vote": "reshape",
      "new_size_pusd": 430.0
    }
  ],
  "mode": "replay"
}

17. Plain-English Reporting

Situation	User-facing explanation
Backtest run completed successfully	The strategy was replayed over the selected historical period. Results include simulated trades, P&L estimates, and a comparison against what actually happened.
Parameter sweep completed	Multiple parameter variants were tested over the same historical data. A summary of results for each variant is available.
Replay window clamped to present	The requested end date is in the future. The backtest was run up to the most recent available data.
Archive data incomplete for window	Some historical data is unavailable for the requested window. The backtest covered only the periods with complete records.

18. Failure-Mode Block

main_failure_mode	The report archive is unavailable or has gaps for the requested replay window, causing the backtester to skip ticks and produce incomplete simulated results.
false_positive_risk	Simulated fills are computed against a historical order book snapshot that may not reflect the true fill price due to latency artifacts in the archive, inflating apparent P&L.
false_negative_risk	A strategy change that only manifests at rare market conditions (e.g. extreme spread) may not be detected if the replay window does not include those events.
safe_fallback	If archive data is incomplete for a tick, skip that tick and emit BACKTESTER_TICK_SKIPPED warn; continue replay from the next available tick. Never extrapolate or fill gaps synthetically. If the archive is entirely unavailable, abort the run and emit BACKTESTER_ARCHIVE_UNAVAILABLE.
required_dependencies	Historical report archive (ObservationReport, DecisionReport, RiskVote, ExecutionReport, SettlementReport), Historical CLOB snapshots (order book + trades), Strategy registry (strategy configuration under test), Internal message bus (OperationsReport emission)

19. Failure-Injection Recipes

Scenario	How to inject	Expected behaviour	Recovery
`ARCHIVE_UNAVAILABLE`	Block all reads from internal.reportArchive and internal.clobArchive	BACKTESTER_ARCHIVE_UNAVAILABLE raised; run aborted; OperationsReport with status=aborted emitted	Once archive is reachable, launch a new replay run. No partial state needs clearing.
`TICK_GAPS_IN_ARCHIVE`	Delete 5% of tick records from clobArchive for the target window	BACKTESTER_TICK_SKIPPED emitted per missing tick; replay continues; aggregate report notes skipped ticks	Automatic; skipped ticks are noted in the aggregate report.
`INVALID_WINDOW`	Submit run with start_ts > end_ts	BACKTESTER_INVALID_WINDOW raised; run rejected immediately	Correct the window configuration and resubmit.
`LARGE_PARAMETER_SWEEP`	Submit parameter_sweep with 201 variants	PARAMETER_CHANGE_REQUIRES_APPROVAL raised; run rejected	Reduce sweep size to <= 200 or request approval.
`KILL_SWITCH_DURING_LAUNCH`	Activate KillSwitch; submit a new backtester run	KILL_SWITCH_ACTIVE logged; new run not started; in-progress runs continue to completion	Deactivate KillSwitch; resubmit run.
`STRATEGY_NOT_IN_REGISTRY`	Submit run with strategy='unknown_strategy'	BACKTESTER_UNKNOWN_STRATEGY raised; run rejected	Use a registered strategy slug.

20. State & Persistence

Cold-start recovery

On restart, any in-progress replay run is marked as interrupted. A new run must be launched manually. No partial replay state is held in memory.

21. Concurrency & Idempotency

Aspect	Specification
Execution model	`single-threaded event loop per replay run; multiple runs may execute in parallel up to max_in_flight`
Max in-flight	`5`
Idempotency key	`replay_run_id`
Per-call timeout (ms)	`300000`
Backpressure strategy	`reject new run launches beyond max_in_flight; emit BACKTESTER_RUN_QUEUE_FULL warn`
Locking / mutual exclusion	`Postgres unique constraint on replay_run_id; in-run state is per-goroutine`

22. Dependencies

Depends on (must run first)

Bot	Why	Contract
`internal.report_archive`	All historical ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes are fetched from the report archive for replay reconstruction.
`internal.clob_archive`	Tick-level CLOB snapshots (order book + trades) are the primary replay data source for simulated fill computation.

Emits to (downstream consumers)

Bot	Why	Contract
`internal.governance_audit`

Sibling bots (same OrderIntent)

Bot	Why	Contract
gov.paper-trade-runner	Paper-Trade Runner uses live data in paper mode; Backtester uses archived data in replay mode. Both emit OperationsReport to the same topic.

Used by (auto-aggregated)

6.8

External services

Service	Endpoint	SLA assumed	On failure
Internal report archive		99.9% (internal SRE target)
Internal CLOB snapshot archive		99.9% (internal SRE target)

23. Security Surfaces

Abuse vectors considered

Submitting a large parameter_sweep (>200 variants) to exhaust archive query budget
Injecting a crafted replay window that targets gaps in the archive to produce misleading P&L results
Manipulating strategy registry to replay a strategy not approved for backtesting

Mitigations

parameter_sweep hard limit of 200 enforced at config load; excess rejected with PARAMETER_CHANGE_REQUIRES_APPROVAL
Backtester is locked to mode=replay; live CLOB and onchain surfaces are never accessed
Strategy registry is read-only within the backtester; only registered strategies can be loaded
All replay runs are audit-logged with replay_run_id and emitted to the governance report bus

24. Polymarket V2 Compatibility

Aspect	Value
CLOB version	`v2`
Collateral asset	`pUSD`
EIP-712 Exchange domain version	`2`
Aware of builderCode field	no
Aware of negative-risk markets	yes
Multi-chain ready	no
SDK used	`py-clob-client-v2`
Settlement contract	`CTFExchangeV2 on Polygon`
Notes	`Backtester consumes archived V2 ReportEnvelope records; all simulated fills are denominated in pUSD. Backtester is replay-only and never makes live CLOB calls. NegRisk market payoffs are replayed using the NegRiskAdapter path stored in the archived SettlementReport.`

API surfaces declared

internal

Networks supported

polygon

25. Versioning & Migration

Field	Value
spec	`2.0.0`
implementation	`2.1.0`
schema	`2`
released	`2026-04-28`

Migration history

Date	From	To	Reason	Action taken
2026-04-28	v1	v2	CLOB V2 cutover	Updated replay pipeline to consume V2 ReportEnvelope format (pUSD denomination, mode field, replay_run_id). Removed USDC.e references from simulated fill output. Replay now ingests archived OperationsReport, DecisionReport, ExecutionReport, and SettlementReport in V2 schema. All simulated fill sizes and P&L now denominated in pUSD. Switched to py-clob-client-v2 archive reader.

26. Acceptance Tests

Unit Tests

Test	Setup	Expected result
Replay window validation rejects start_ts >= end_ts	start_ts='2026-05-09', end_ts='2026-05-01'	BACKTESTER_INVALID_WINDOW ConfigError
Unknown strategy slug raises error	strategy='nonexistent_strategy'	BACKTESTER_UNKNOWN_STRATEGY ConfigError
Parameter sweep above hard limit (200) is rejected	parameter_sweep = list of 201 variants	PARAMETER_CHANGE_REQUIRES_APPROVAL ConfigError
Replay assigns mode=replay on all emitted reports	Run single-variant replay; inspect all emitted OperationsReport payloads	All reports carry mode='replay' and replay_run_id
Simulated fill denominated in pUSD (not USDC.e)	Replay a fill event from historical archive	simulated_fill output contains size_pusd field; no USDC.e references
Archive gap causes tick skip, not crash	Inject missing tick in archive at t=T; continue replay	BACKTESTER_TICK_SKIPPED emitted; replay continues from T+1

Integration Tests

Test	Expected result
End-to-end: 7-day replay of sum_to_one_arb produces aggregate OperationsReport	Aggregate OperationsReport with ticks_replayed > 0, simulated_fills >= 0, mode=replay, retained_until = now+1y
Parameter sweep of 3 variants produces 3 separate OperationsReport summaries	3 reports, each with distinct parameter_variant field and independent simulated P&L
KillSwitch active suppresses new run launches	KILL_SWITCH_ACTIVE logged; backtest run not started; in-progress runs continue

Property Tests

Property	Required behaviour
All emitted reports carry mode=replay and replay_run_id	Always true — replay mode is locked immutable in default_config
No live CLOB calls are made during replay	Always true — all inputs come from the archive; no clob_auth or onchain surfaces are accessed
Replay is deterministic: same inputs produce same outputs	Always true — no randomness; archive inputs are immutable

27. Operational Runbook

Backtester incidents involve archive unavailability (run aborts), high tick-skip rates (data quality issue), or runs stuck in progress (goroutine leak or resource exhaustion). All incidents are low-urgency unless they block a promotion gate decision.

On-call actions

Alert	First step	Escalate to
`BacktesterArchiveUnavailable`	Check internal report archive and CLOB archive health. Verify storage backend is reachable.	Governance pod lead if archive is down for > 30 minutes
`BacktesterHighTickSkipRate`	Check clobArchive completeness for the affected window. May indicate a historical data ingestion failure.	Governance pod lead; data engineering if archive ingestion is the root cause
`BacktesterNoRunsIn24h`	Check if any promotion-gate backtests are pending. If yes, investigate whether run launches are being suppressed by KillSwitch.	Governance pod lead
`BacktesterRunDurationHigh`	Check archive query latency and concurrent run count. Reduce max_in_flight if resource contention is the cause.	SRE on-call if duration > 10 minutes consistently

Manual overrides

polytraders gov backtest abort --replay-run-id <id> — Aborts a stuck replay run; use after investigating root cause. Originally: A replay run is stuck in progress and blocking the run queue.

Healthcheck

/internal/health/backtester → 200 Last run completed < 24h ago; archive reachable; max_in_flight not saturated; ticks_skipped_rate < 0.01.; red if Archive unreachable; all runs aborting; run queue saturated for > 5 minutes.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

Gate	How measured	Threshold
Unit tests pass for window validation, strategy loading, and archive gap handling	CI test run	100% pass
Postgres replay_runs schema migration verified	Integration test	Pass

Promote to Limited live

Gate	How measured	Threshold
7-day replay of sum_to_one_arb completes in < 5 minutes with < 0.1% tick skip rate	polytraders_gov_backtester_run_duration_ms + ticks_skipped_rate	< 5 min, < 0.1% skips
All emitted OperationsReport records carry mode=replay and replay_run_id	Report bus audit; sample 100 records	100% compliance

Promote to General live

Gate	How measured	Threshold
Parameter sweep of 5 variants completes deterministically: identical inputs produce identical outputs on two successive runs	Determinism test: run same sweep twice; compare simulated_pnl_pusd per variant	0 delta between runs
Promotion-gate backtest evidence accepted for one strategy live promotion decision	Governance pod review	Pass

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

Requirement	Status
Purpose defined	✓ done
Required inputs listed	✓ done
Parameters defined	✓ done
Defaults defined	✓ done
Warning thresholds defined	✓ done
Hard thresholds defined	✓ done
Safe fallback defined	✓ done
Structured output defined	✓ done
Developer log defined	✓ done
Plain-English explanation	✓ done
Unit tests defined	✓ done
Integration tests defined	✓ done
Property tests defined	✓ done
Failure-mode block complete	✓ done
Reference implementation pseudocode	✓ done
Wire examples (input + output)	✓ done
Reason codes listed	✓ done
Metrics & logs defined	✓ done
State & persistence defined	✓ done
Concurrency & idempotency defined	✓ done
Dependencies declared	✓ done
Security surfaces declared	✓ done
Polymarket V2 compatibility declared	✓ done
Version & migration history declared	✓ done
Operational runbook defined	✓ done
Promotion gates defined	✓ done
Failure-injection recipes defined	✓ done

6.4 Backtester

v3 readiness

1. Bot Identity

2. Purpose

3. Why This Bot Matters

Strategy promoted to live without backtesting

Backtester uses different execution path than live

Replay-tagged reports not emitted

Parameter sweep runs with non-deterministic inputs

4. Required Polymarket Inputs

5. Required Internal Inputs

6. Parameter Guide

7. Detailed Parameter Instructions

start_ts

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

end_ts

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

strategy

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

parameter_sweep

What it means

Default

Why this default matters

Threshold logic

Developer check

User-facing English

8. Default Configuration

9. Implementation Flow

10. Reference Implementation

SDK calls used

11. Wire Examples

Input — what arrives on the wire

Output — what the bot emits

12. Decision Logic

APPROVE

RESHAPE_REQUIRED

REJECT

WARNING_ONLY

13. Standard Decision Output

14. Reason Codes

15. Metrics & Logs

Metrics emitted

Alerts

Dashboards

16. Developer Reporting

17. Plain-English Reporting

18. Failure-Mode Block

19. Failure-Injection Recipes

20. State & Persistence

Cold-start recovery

21. Concurrency & Idempotency

22. Dependencies

Depends on (must run first)

Emits to (downstream consumers)

Sibling bots (same OrderIntent)

Used by (auto-aggregated)

External services

23. Security Surfaces

Abuse vectors considered

Mitigations

24. Polymarket V2 Compatibility

API surfaces declared

Networks supported

25. Versioning & Migration

Migration history