Polytraders Dev Guide
internal
v3 spine Phase 1 · Shared contracts 9 demo-wired · 0 shadow-ready · 0 production-live · 100 pending · 109 total 15/33 infra tasks the plan status board
HomeBy LayerGovernance6.4 Backtester

6.4 Backtester

Governance Governance Service Explain BETA Limited live capital · Indirect P7 · Governance & replay pending stub

Backtester replays historical CLOB snapshots and the full report archive through the live execution path at tick resolution. It runs in replay mode (mode=replay), consuming archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes and re-executing the strategy under test against them. It emits replay-tagged OperationsReport records (and replay-tagged copies of every simulated report kind) to polytraders.reports.operations, partitioned by bot_slug+epoch, retained for 1 year. Backtester is used by the governance team to validate parameter changes, A/B test strategies, and produce audit-quality evidence before any strategy is promoted to live. Backtester never signs orders or touches the live CLOB.

v3 readiness

Docs27/27
donehow scored
Impl0/15
pendinghow scored
Backtest0/4
pendinghow scored
Runtime0/8
pendinghow scored

A bot is done when all four scores are. What does done mean?

1. Bot Identity

LayerGovernance  Governance
Bot classGovernance Service
AuthorityExplain
StatusBETA
ReadinessLimited live
Runs beforeNothing — Backtester is a governance simulation bot; it runs in replay mode against the report archive and never precedes live execution
Runs afterHistorical report archive is populated; ReportEnvelope records for the target time window are available
Applies toAll strategy bots configured for replay; any historical OrderIntent window specified via start_ts/end_ts
Default modelimited_live
User-visibleAdvanced details only
Developer ownerPolytraders core — Governance pod

2. Purpose

Backtester replays historical CLOB snapshots and the full report archive through the live execution path at tick resolution. It runs in replay mode (mode=replay), consuming archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes and re-executing the strategy under test against them. It emits replay-tagged OperationsReport records (and replay-tagged copies of every simulated report kind) to polytraders.reports.operations, partitioned by bot_slug+epoch, retained for 1 year. Backtester is used by the governance team to validate parameter changes, A/B test strategies, and produce audit-quality evidence before any strategy is promoted to live. Backtester never signs orders or touches the live CLOB.

3. Why This Bot Matters

  • Strategy promoted to live without backtesting

    Untested parameter changes or new strategy logic may produce runaway losses or adverse fills on live capital. Backtesting is a mandatory gate before promotion.

  • Backtester uses different execution path than live

    Results are not comparable to live performance. Governance audit evidence is invalid and cannot be used to justify promotion decisions.

  • Replay-tagged reports not emitted

    Backtesting runs are not auditable. Governance cannot produce the required evidence trail for strategy promotion gates.

  • Parameter sweep runs with non-deterministic inputs

    Results are not reproducible. Successive backtests on the same window may produce different outputs, making comparison and audit impossible.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

InputSourceRequired?Use
Historical CLOB snapshots (order book + trades, tick-level)internalYesPrimary replay data source. Backtester reconstructs the order book state at each tick from the archived snapshot stream.
Archived ObservationReport envelopesinternalYesReplay pre-trade intelligence signals as they existed at the original observation time.
Archived DecisionReport envelopesinternalYesReconstruct the original strategy decisions for comparison against replayed decisions.
Archived RiskVote envelopesinternalYesReconstruct risk guardrail votes that were in effect during the replay window.
Archived ExecutionReport envelopesinternalNoCompare simulated fill outcomes against actual historical fills for slippage and fill-quality analysis.
Archived SettlementReport envelopesinternalNoReplay post-trade P&L settlement to reconstruct the full governance evidence trail.

5. Required Internal Inputs

InputSourceRequired?Use
Strategy configuration under testConfig storeYesParameters (or parameter sweep grid) for the strategy being backtested.
KillSwitch active flagKillSwitchNoWhen KillSwitch is active, suppress new backtesting run launches. Runs already in progress may continue.

6. Parameter Guide

ParameterDefaultWarningHardWhat it controls
start_tsnow()-7dNoneNoneStart timestamp (ISO 8601, UTC) for the replay window.
end_tsnow()NoneNoneEnd timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts.
strategysum_to_one_arbNoneNoneThe strategy bot slug to replay through the backtester execution path.
parameter_sweep[]50200List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job.

7. Detailed Parameter Instructions

start_ts

What it means

Start timestamp (ISO 8601, UTC) for the replay window.

Default

{ "start_ts": "now()-7d" }

Why this default matters

Defaulting to 7 days ago ensures at least one full trading week is covered by each backtest run.

Threshold logic

ConditionAction
start_ts < end_tsAccept window and begin replay
start_ts >= end_tsReject with BACKTESTER_INVALID_WINDOW

Developer check

if (p.start_ts >= p.end_ts) throw ConfigError('BACKTESTER_INVALID_WINDOW')

User-facing English

The backtest covers historical data starting from this date and time.

end_ts

What it means

End timestamp (ISO 8601, UTC) for the replay window. Must be greater than start_ts.

Default

{ "end_ts": "now()" }

Why this default matters

Defaulting to now() makes fresh backtests cover up to the most recent archived data.

Threshold logic

ConditionAction
end_ts > start_ts AND end_ts <= now()Accept window
end_ts > now()Clamp to now(); emit BACKTESTER_WINDOW_CLAMPED warn

Developer check

if (p.end_ts > now()) { p.end_ts = now(); log.warn('BACKTESTER_WINDOW_CLAMPED') }

User-facing English

The backtest covers historical data up to this date and time.

strategy

What it means

The strategy bot slug to replay through the backtester execution path.

Default

{ "strategy": "sum_to_one_arb" }

Why this default matters

Default to sum_to_one_arb as the most commonly backtested strategy. Must be explicitly set for other strategy types.

Threshold logic

ConditionAction
strategy in registered strategy registryLoad strategy configuration and begin replay
strategy not in registryReject with BACKTESTER_UNKNOWN_STRATEGY

Developer check

if (!strategyRegistry.has(p.strategy)) throw ConfigError('BACKTESTER_UNKNOWN_STRATEGY')

User-facing English

The trading strategy being evaluated in this backtest.

parameter_sweep

What it means

List of parameter override objects, each defining one variant run. Empty list means single run with defaults. Max 200 variants per backtest job.

Default

{ "parameter_sweep": [] }

Why this default matters

Empty list (single run) is safe as a default. Sweeps beyond 200 variants can exceed the archive query budget and should require explicit approval.

Threshold logic

ConditionAction
len(parameter_sweep) == 0Run single backtest with strategy defaults
1 <= len(parameter_sweep) <= 50Run all variants; emit summary OperationsReport per variant
51 <= len(parameter_sweep) <= 200WARN BACKTESTER_LARGE_SWEEP; run all variants with resource throttle
len(parameter_sweep) > 200Reject with PARAMETER_CHANGE_REQUIRES_APPROVAL

Developer check

if (p.parameter_sweep.length > p.hard) throw ConfigError('PARAMETER_CHANGE_REQUIRES_APPROVAL')

User-facing English

A list of parameter variations to test. Each entry produces a separate backtest result for comparison.

8. Default Configuration

{
  "bot_id": "gov.backtester",
  "version": "2.0.0",
  "mode": "replay",
  "defaults": {
    "start_ts": "now()-7d",
    "end_ts": "now()",
    "strategy": "sum_to_one_arb",
    "parameter_sweep": []
  },
  "locked": {
    "mode": {
      "immutable": true,
      "value": "replay"
    }
  }
}

9. Implementation Flow

  1. On run start, validate start_ts < end_ts and strategy is in the registry; reject with BACKTESTER_INVALID_WINDOW or BACKTESTER_UNKNOWN_STRATEGY otherwise.
  2. Assign a replay_run_id (ULID) for this run; all emitted reports for this run carry replay_run_id and mode=replay.
  3. Fetch archived ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes for the window [start_ts, end_ts] from the report archive.
  4. For each tick in the replay window (tick = snapshot boundary from historical CLOB snapshots), reconstruct the order book state and available signals at that moment.
  5. Feed reconstructed state and archived signals into the strategy under test using the same execution path as live (same guardrails, same execution logic).
  6. For each simulated OrderIntent, run the full risk guardrail stack (replay mode — votes are simulated, not live); record each vote outcome.
  7. Simulate fill outcomes using the historical order book snapshot at the matching tick; compute simulated fill price, slippage, and fee using pUSD denomination.
  8. Emit replay-tagged OperationsReport after each simulated fill: includes replay_run_id, original_trace_id, simulated verdict, simulated P&L in pUSD, and comparison against original outcome.
  9. Emit replay-tagged copies of simulated DecisionReport and ExecutionReport envelopes with mode=replay and replay_run_id for archive and audit.
  10. At run completion, emit an aggregate OperationsReport summarising: total ticks replayed, simulated fills, simulated P&L, parameter variant results if sweep was configured, and comparison deltas vs original outcomes.
  11. Retain all replay-tagged reports in polytraders.reports.operations for 1 year, partitioned by bot_slug+epoch.

10. Reference Implementation

Loads historical CLOB snapshots and archived report envelopes for the replay window, reconstructs the execution path tick by tick, emits replay-tagged OperationsReport per simulated fill and an aggregate report at run end. All outputs carry mode=replay and replay_run_id.

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. Translate to TS/Python/Go/Rust.

// ---- STARTUP ----
FUNCTION initReplay(config):
  IF config.start_ts >= config.end_ts:
    RAISE ConfigError('BACKTESTER_INVALID_WINDOW')
  IF NOT strategyRegistry.has(config.strategy):
    RAISE ConfigError('BACKTESTER_UNKNOWN_STRATEGY')
  IF config.end_ts > now():
    config.end_ts = now()
    alerting.emit('BACKTESTER_WINDOW_CLAMPED')
  replay_run_id = generateULID()  // e.g. 'replay_01HX9KZQ7E8VR5'
  variants = config.parameter_sweep OR [config.strategy.defaults]

// ---- PER-VARIANT LOOP ----
FOR variant IN variants:
  strategy = strategyRegistry.load(config.strategy, params=variant)

  // Fetch archived signals for the full window
  archive = FETCH internal.reportArchive.GET({
    window:   [config.start_ts, config.end_ts],
    kinds:    ['ObservationReport', 'DecisionReport', 'RiskVote',
               'ExecutionReport', 'SettlementReport']
  })

  // Fetch archived CLOB snapshots (tick-level)
  snapshots = FETCH internal.clobArchive.GET({
    window:   [config.start_ts, config.end_ts],
    resolution: 'tick'
  })

  sim_fills = []
  sim_pnl_pusd = 0.0

  FOR tick IN snapshots:
    // Reconstruct order book at this tick
    book = reconstructBook(tick)

    // Replay archived signals at this tick timestamp
    signals = archive.signalsAt(tick.ts_ms)

    // Run strategy in replay mode
    intent = strategy.evaluate(book, signals, mode='replay')
    IF intent IS NULL:
      CONTINUE

    // Simulate risk guardrail stack (replay mode)
    votes = riskStack.simulate(intent, context={
      book: book, signals: signals, mode: 'replay'
    })
    shaped_intent = applyVotes(intent, votes)
    IF shaped_intent IS NULL:
      EMIT OperationsReport(event_type='BACKTESTER_INTENT_REJECTED',
                            replay_run_id=replay_run_id, mode='replay', ...)
      CONTINUE

    // Simulate fill against historical book
    sim_fill = simulateFill(shaped_intent, book)
    sim_fill.size_pusd   = toPusdUnits(sim_fill.size_usd)
    sim_fill.fee_pusd    = sim_fill.size_pusd * sim_fill.fee_bps / 10_000
    sim_fill.pnl_pusd    = computeRealisedPnL(sim_fill, costBasis='FIFO')
    sim_fills.append(sim_fill)
    sim_pnl_pusd += sim_fill.pnl_pusd

    // Emit per-fill replay OperationsReport
    EMIT OperationsReport({
      report_id:       'ops_backtester_' + replay_run_id + '_' + tick.ts_ms,
      bot_id:          'gov.backtester',
      event_type:      'BACKTESTER_TICK_PROCESSED',
      replay_run_id:   replay_run_id,
      mode:            'replay',
      tick_ts_ms:      tick.ts_ms,
      market_id:       shaped_intent.market_id,
      simulated_fill:  sim_fill,
      risk_votes:      votes,
      topic:           'polytraders.reports.operations',
      partition:       'backtester+' + epochBucket(tick.ts_ms)
    })

  // Aggregate summary report at run end
  EMIT OperationsReport({
    event_type:              'BACKTESTER_REPLAY_COMPLETE',
    replay_run_id:           replay_run_id,
    strategy:                config.strategy,
    parameter_variant:       variant,
    mode:                    'replay',
    window_start:            config.start_ts,
    window_end:              config.end_ts,
    ticks_replayed:          len(snapshots),
    simulated_fills:         len(sim_fills),
    simulated_volume_pusd:   SUM(f.size_pusd FOR f IN sim_fills),
    simulated_pnl_pusd:      sim_pnl_pusd,
    simulated_net_fees_pusd: SUM(f.fee_pusd FOR f IN sim_fills),
    topic:                   'polytraders.reports.operations',
    partition:               'backtester+' + epochBucket(config.end_ts),
    retained_until:          config.end_ts + 1y
  })

SDK calls used

  • internal.reportArchive.GET({ window, kinds })
  • internal.clobArchive.GET({ window, resolution: 'tick' })
  • strategyRegistry.load(slug, params)
  • riskStack.simulate(intent, context)
  • toPusdUnits(raw_usd)
  • alerting.emit('BACKTESTER_WINDOW_CLAMPED', metadata)

Complexity: O(T * V) where T = ticks in replay window, V = parameter_sweep variants

11. Wire Examples

Input — what arrives on the wire

{
  "label": "Backtester run configuration",
  "source": "config_store",
  "payload": {
    "bot_id": "gov.backtester",
    "mode": "replay",
    "strategy": "sum_to_one_arb",
    "start_ts": "2026-05-01T00:00:00Z",
    "end_ts": "2026-05-08T00:00:00Z",
    "parameter_sweep": [
      {
        "min_edge_bps": 12,
        "max_size_pusd": 1000
      },
      {
        "min_edge_bps": 15,
        "max_size_pusd": 1500
      }
    ]
  }
}

Output — what the bot emits

{
  "label": "OperationsReport — BACKTESTER_REPLAY_COMPLETE (2-variant sweep)",
  "payload": {
    "report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
    "bot_id": "gov.backtester",
    "event_type": "BACKTESTER_REPLAY_COMPLETE",
    "replay_run_id": "replay_01HX9KZQ7E8VR5",
    "strategy": "sum_to_one_arb",
    "mode": "replay",
    "window_start": "2026-05-01T00:00:00Z",
    "window_end": "2026-05-08T00:00:00Z",
    "ticks_replayed": 483120,
    "parameter_variants_run": 2,
    "variants": [
      {
        "variant_id": 0,
        "params": {
          "min_edge_bps": 12,
          "max_size_pusd": 1000
        },
        "simulated_fills": 214,
        "simulated_volume_pusd": 92450.0,
        "simulated_pnl_pusd": 1380.5,
        "simulated_net_fees_pusd": 231.1
      },
      {
        "variant_id": 1,
        "params": {
          "min_edge_bps": 15,
          "max_size_pusd": 1500
        },
        "simulated_fills": 178,
        "simulated_volume_pusd": 104200.0,
        "simulated_pnl_pusd": 1890.2,
        "simulated_net_fees_pusd": 260.5
      }
    ],
    "report_kind": "OperationsReport",
    "topic": "polytraders.reports.operations",
    "partition": "backtester+2026-05-08T00:00Z",
    "retained_until": "2027-05-08"
  }
}

12. Decision Logic

APPROVE

Not applicable — Backtester is a simulation bot in replay mode. It never approves live orders.

RESHAPE_REQUIRED

Not applicable.

REJECT

Not applicable as a live trading decision. Backtester will abort a replay run if the archive data is incomplete or the strategy is unknown.

WARNING_ONLY

Parameter sweeps above 50 variants emit BACKTESTER_LARGE_SWEEP. Replay windows that extend beyond now() are clamped with BACKTESTER_WINDOW_CLAMPED warn.

13. Standard Decision Output

This bot returns a OperationsReport object. See OperationsReport schema.

{
  "report_id": "ops_backtester_replay_01HX9KZQ7E8VR5",
  "bot_id": "gov.backtester",
  "event_type": "BACKTESTER_REPLAY_COMPLETE",
  "replay_run_id": "replay_01HX9KZQ7E8VR5",
  "strategy": "sum_to_one_arb",
  "mode": "replay",
  "window_start": "2026-05-01T00:00:00Z",
  "window_end": "2026-05-08T00:00:00Z",
  "ticks_replayed": 483120,
  "simulated_fills": 214,
  "simulated_volume_pusd": 92450.0,
  "simulated_pnl_pusd": 1380.5,
  "simulated_net_fees_pusd": 231.1,
  "vs_original_pnl_delta_pusd": 42.0,
  "parameter_variants_run": 1,
  "report_kind": "OperationsReport",
  "topic": "polytraders.reports.operations",
  "partition": "backtester+2026-05-08T00:00Z",
  "retained_until": "2027-05-08"
}

14. Reason Codes

CodeSeverityMeaningActionUser-facing message
BACKTESTER_REPLAY_COMPLETEINFOA replay run completed successfully; aggregate OperationsReport emitted.No action — routine completion.The backtest finished. Results are available in the governance audit report.
BACKTESTER_TICK_PROCESSEDINFOA single tick was processed during replay; per-tick OperationsReport emitted.No action — operational heartbeat.
BACKTESTER_INVALID_WINDOWHARD_REJECTstart_ts >= end_ts; the replay window is invalid.Reject the run configuration; emit alert.The backtest start date must be before the end date.
BACKTESTER_UNKNOWN_STRATEGYHARD_REJECTThe strategy slug specified in the config is not in the strategy registry.Reject the run; emit alert.The requested strategy is not available for backtesting.
BACKTESTER_WINDOW_CLAMPEDWARNend_ts was in the future; clamped to now().Clamp end_ts; emit WARN; continue replay.The backtest end date was set to the most recent available data.
BACKTESTER_TICK_SKIPPEDWARNA tick was missing from the archive; the replay skipped that tick.Log WARN; skip tick; continue from next available tick.
BACKTESTER_ARCHIVE_UNAVAILABLEHARD_REJECTThe report archive is entirely unavailable; the replay cannot start or has stalled.Abort the run; emit alert.
BACKTESTER_LARGE_SWEEPWARNParameter sweep has more than 50 variants; resource throttle applied.Emit WARN; apply run throttle; continue sweep.
KILL_SWITCH_ACTIVEWARNKillSwitch is active; new backtesting run launches are suppressed.Suppress new run; emit WARN; in-progress runs continue.New backtests are paused while the system kill switch is active.

15. Metrics & Logs

Metrics emitted

MetricTypeUnitLabelsMeaning
polytraders_gov_backtester_runs_totalcountercountstrategy, statusTotal backtester runs launched, labelled by strategy slug and completion status (complete/aborted).
polytraders_gov_backtester_ticks_processed_totalcountercountstrategy, replay_run_idTotal ticks processed across all replay runs.
polytraders_gov_backtester_simulated_fills_totalcountercountstrategyTotal simulated fills produced across all runs.
polytraders_gov_backtester_simulated_pnl_pusdgaugeusdstrategy, replay_run_idSimulated P&L in pUSD for the most recent completed replay run per strategy.
polytraders_gov_backtester_ticks_skipped_totalcountercountstrategyTicks skipped due to archive gaps. Should remain near zero.
polytraders_gov_backtester_run_duration_mshistogrammsstrategyWall-clock duration of a complete backtest run.

Alerts

AlertConditionSeverityRunbook
BacktesterArchiveUnavailablerate(polytraders_gov_backtester_runs_total{status='aborted'}[15m]) > 0page#runbook-backtester-archive-unavailable
BacktesterHighTickSkipRaterate(polytraders_gov_backtester_ticks_skipped_total[10m]) / rate(polytraders_gov_backtester_ticks_processed_total[10m]) > 0.01warn#runbook-backtester-tick-skip
BacktesterNoRunsIn24hrate(polytraders_gov_backtester_runs_total[24h]) == 0warn#runbook-backtester-no-runs
BacktesterRunDurationHighhistogram_quantile(0.99, polytraders_gov_backtester_run_duration_ms) > 300000warn#runbook-backtester-slow-run

Dashboards

  • Grafana — Governance / Backtester run history and simulated P&L by strategy
  • Grafana — Governance / Parameter sweep results comparison

16. Developer Reporting

{
  "bot_id": "gov.backtester",
  "event_type": "BACKTESTER_TICK_PROCESSED",
  "replay_run_id": "replay_01HX9KZQ7E8VR5",
  "tick_ts_ms": 1746792060000,
  "market_id": "0x9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c",
  "simulated_intent_emitted": true,
  "simulated_fill_price": 0.621,
  "simulated_fill_size_pusd": 430.0,
  "simulated_slippage_bps": 0.8,
  "risk_votes": [
    {
      "bot": "liquidityguard",
      "vote": "pass"
    },
    {
      "bot": "portfolioguard",
      "vote": "reshape",
      "new_size_pusd": 430.0
    }
  ],
  "mode": "replay"
}

17. Plain-English Reporting

SituationUser-facing explanation
Backtest run completed successfullyThe strategy was replayed over the selected historical period. Results include simulated trades, P&L estimates, and a comparison against what actually happened.
Parameter sweep completedMultiple parameter variants were tested over the same historical data. A summary of results for each variant is available.
Replay window clamped to presentThe requested end date is in the future. The backtest was run up to the most recent available data.
Archive data incomplete for windowSome historical data is unavailable for the requested window. The backtest covered only the periods with complete records.

18. Failure-Mode Block

main_failure_modeThe report archive is unavailable or has gaps for the requested replay window, causing the backtester to skip ticks and produce incomplete simulated results.
false_positive_riskSimulated fills are computed against a historical order book snapshot that may not reflect the true fill price due to latency artifacts in the archive, inflating apparent P&L.
false_negative_riskA strategy change that only manifests at rare market conditions (e.g. extreme spread) may not be detected if the replay window does not include those events.
safe_fallbackIf archive data is incomplete for a tick, skip that tick and emit BACKTESTER_TICK_SKIPPED warn; continue replay from the next available tick. Never extrapolate or fill gaps synthetically. If the archive is entirely unavailable, abort the run and emit BACKTESTER_ARCHIVE_UNAVAILABLE.
required_dependenciesHistorical report archive (ObservationReport, DecisionReport, RiskVote, ExecutionReport, SettlementReport), Historical CLOB snapshots (order book + trades), Strategy registry (strategy configuration under test), Internal message bus (OperationsReport emission)

19. Failure-Injection Recipes

ScenarioHow to injectExpected behaviourRecovery
ARCHIVE_UNAVAILABLEBlock all reads from internal.reportArchive and internal.clobArchiveBACKTESTER_ARCHIVE_UNAVAILABLE raised; run aborted; OperationsReport with status=aborted emittedOnce archive is reachable, launch a new replay run. No partial state needs clearing.
TICK_GAPS_IN_ARCHIVEDelete 5% of tick records from clobArchive for the target windowBACKTESTER_TICK_SKIPPED emitted per missing tick; replay continues; aggregate report notes skipped ticksAutomatic; skipped ticks are noted in the aggregate report.
INVALID_WINDOWSubmit run with start_ts > end_tsBACKTESTER_INVALID_WINDOW raised; run rejected immediatelyCorrect the window configuration and resubmit.
LARGE_PARAMETER_SWEEPSubmit parameter_sweep with 201 variantsPARAMETER_CHANGE_REQUIRES_APPROVAL raised; run rejectedReduce sweep size to <= 200 or request approval.
KILL_SWITCH_DURING_LAUNCHActivate KillSwitch; submit a new backtester runKILL_SWITCH_ACTIVE logged; new run not started; in-progress runs continue to completionDeactivate KillSwitch; resubmit run.
STRATEGY_NOT_IN_REGISTRYSubmit run with strategy='unknown_strategy'BACKTESTER_UNKNOWN_STRATEGY raised; run rejectedUse a registered strategy slug.

20. State & Persistence

Cold-start recovery

On restart, any in-progress replay run is marked as interrupted. A new run must be launched manually. No partial replay state is held in memory.

21. Concurrency & Idempotency

AspectSpecification
Execution modelsingle-threaded event loop per replay run; multiple runs may execute in parallel up to max_in_flight
Max in-flight5
Idempotency keyreplay_run_id
Per-call timeout (ms)300000
Backpressure strategyreject new run launches beyond max_in_flight; emit BACKTESTER_RUN_QUEUE_FULL warn
Locking / mutual exclusionPostgres unique constraint on replay_run_id; in-run state is per-goroutine

22. Dependencies

Depends on (must run first)

BotWhyContract
internal.report_archiveAll historical ObservationReport, DecisionReport, RiskVote, ExecutionReport, and SettlementReport envelopes are fetched from the report archive for replay reconstruction.
internal.clob_archiveTick-level CLOB snapshots (order book + trades) are the primary replay data source for simulated fill computation.

Emits to (downstream consumers)

BotWhyContract
internal.governance_audit

Sibling bots (same OrderIntent)

BotWhyContract
gov.paper-trade-runnerPaper-Trade Runner uses live data in paper mode; Backtester uses archived data in replay mode. Both emit OperationsReport to the same topic.

Used by (auto-aggregated)

6.8

External services

ServiceEndpointSLA assumedOn failure
Internal report archive99.9% (internal SRE target)
Internal CLOB snapshot archive99.9% (internal SRE target)

23. Security Surfaces

Abuse vectors considered

  • Submitting a large parameter_sweep (>200 variants) to exhaust archive query budget
  • Injecting a crafted replay window that targets gaps in the archive to produce misleading P&L results
  • Manipulating strategy registry to replay a strategy not approved for backtesting

Mitigations

  • parameter_sweep hard limit of 200 enforced at config load; excess rejected with PARAMETER_CHANGE_REQUIRES_APPROVAL
  • Backtester is locked to mode=replay; live CLOB and onchain surfaces are never accessed
  • Strategy registry is read-only within the backtester; only registered strategies can be loaded
  • All replay runs are audit-logged with replay_run_id and emitted to the governance report bus

24. Polymarket V2 Compatibility

AspectValue
CLOB versionv2
Collateral assetpUSD
EIP-712 Exchange domain version2
Aware of builderCode fieldno
Aware of negative-risk marketsyes
Multi-chain readyno
SDK usedpy-clob-client-v2
Settlement contractCTFExchangeV2 on Polygon
NotesBacktester consumes archived V2 ReportEnvelope records; all simulated fills are denominated in pUSD. Backtester is replay-only and never makes live CLOB calls. NegRisk market payoffs are replayed using the NegRiskAdapter path stored in the archived SettlementReport.

API surfaces declared

internal

Networks supported

polygon

25. Versioning & Migration

FieldValue
spec2.0.0
implementation2.1.0
schema2
released2026-04-28

Migration history

DateFromToReasonAction taken
2026-04-28v1v2CLOB V2 cutoverUpdated replay pipeline to consume V2 ReportEnvelope format (pUSD denomination, mode field, replay_run_id). Removed USDC.e references from simulated fill output. Replay now ingests archived OperationsReport, DecisionReport, ExecutionReport, and SettlementReport in V2 schema. All simulated fill sizes and P&L now denominated in pUSD. Switched to py-clob-client-v2 archive reader.

26. Acceptance Tests

Unit Tests

TestSetupExpected result
Replay window validation rejects start_ts >= end_tsstart_ts='2026-05-09', end_ts='2026-05-01'BACKTESTER_INVALID_WINDOW ConfigError
Unknown strategy slug raises errorstrategy='nonexistent_strategy'BACKTESTER_UNKNOWN_STRATEGY ConfigError
Parameter sweep above hard limit (200) is rejectedparameter_sweep = list of 201 variantsPARAMETER_CHANGE_REQUIRES_APPROVAL ConfigError
Replay assigns mode=replay on all emitted reportsRun single-variant replay; inspect all emitted OperationsReport payloadsAll reports carry mode='replay' and replay_run_id
Simulated fill denominated in pUSD (not USDC.e)Replay a fill event from historical archivesimulated_fill output contains size_pusd field; no USDC.e references
Archive gap causes tick skip, not crashInject missing tick in archive at t=T; continue replayBACKTESTER_TICK_SKIPPED emitted; replay continues from T+1

Integration Tests

TestExpected result
End-to-end: 7-day replay of sum_to_one_arb produces aggregate OperationsReportAggregate OperationsReport with ticks_replayed > 0, simulated_fills >= 0, mode=replay, retained_until = now+1y
Parameter sweep of 3 variants produces 3 separate OperationsReport summaries3 reports, each with distinct parameter_variant field and independent simulated P&L
KillSwitch active suppresses new run launchesKILL_SWITCH_ACTIVE logged; backtest run not started; in-progress runs continue

Property Tests

PropertyRequired behaviour
All emitted reports carry mode=replay and replay_run_idAlways true — replay mode is locked immutable in default_config
No live CLOB calls are made during replayAlways true — all inputs come from the archive; no clob_auth or onchain surfaces are accessed
Replay is deterministic: same inputs produce same outputsAlways true — no randomness; archive inputs are immutable

27. Operational Runbook

Backtester incidents involve archive unavailability (run aborts), high tick-skip rates (data quality issue), or runs stuck in progress (goroutine leak or resource exhaustion). All incidents are low-urgency unless they block a promotion gate decision.

On-call actions

AlertFirst stepDiagnosisMitigationEscalate to
BacktesterArchiveUnavailableCheck internal report archive and CLOB archive health. Verify storage backend is reachable.Governance pod lead if archive is down for > 30 minutes
BacktesterHighTickSkipRateCheck clobArchive completeness for the affected window. May indicate a historical data ingestion failure.Governance pod lead; data engineering if archive ingestion is the root cause
BacktesterNoRunsIn24hCheck if any promotion-gate backtests are pending. If yes, investigate whether run launches are being suppressed by KillSwitch.Governance pod lead
BacktesterRunDurationHighCheck archive query latency and concurrent run count. Reduce max_in_flight if resource contention is the cause.SRE on-call if duration > 10 minutes consistently

Manual overrides

  • polytraders gov backtest abort --replay-run-id <id> — Aborts a stuck replay run; use after investigating root cause. Originally: A replay run is stuck in progress and blocking the run queue.

Healthcheck

/internal/health/backtester → 200 Last run completed < 24h ago; archive reachable; max_in_flight not saturated; ticks_skipped_rate < 0.01.; red if Archive unreachable; all runs aborting; run queue saturated for > 5 minutes.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

GateHow measuredThreshold
Unit tests pass for window validation, strategy loading, and archive gap handlingCI test run100% pass
Postgres replay_runs schema migration verifiedIntegration testPass

Promote to Limited live

GateHow measuredThreshold
7-day replay of sum_to_one_arb completes in < 5 minutes with < 0.1% tick skip ratepolytraders_gov_backtester_run_duration_ms + ticks_skipped_rate< 5 min, < 0.1% skips
All emitted OperationsReport records carry mode=replay and replay_run_idReport bus audit; sample 100 records100% compliance

Promote to General live

GateHow measuredThreshold
Parameter sweep of 5 variants completes deterministically: identical inputs produce identical outputs on two successive runsDeterminism test: run same sweep twice; compare simulated_pnl_pusd per variant0 delta between runs
Promotion-gate backtest evidence accepted for one strategy live promotion decisionGovernance pod reviewPass

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

RequirementStatus
Purpose defined✓ done
Required inputs listed✓ done
Parameters defined✓ done
Defaults defined✓ done
Warning thresholds defined✓ done
Hard thresholds defined✓ done
Safe fallback defined✓ done
Structured output defined✓ done
Developer log defined✓ done
Plain-English explanation✓ done
Unit tests defined✓ done
Integration tests defined✓ done
Property tests defined✓ done
Failure-mode block complete✓ done
Reference implementation pseudocode✓ done
Wire examples (input + output)✓ done
Reason codes listed✓ done
Metrics & logs defined✓ done
State & persistence defined✓ done
Concurrency & idempotency defined✓ done
Dependencies declared✓ done
Security surfaces declared✓ done
Polymarket V2 compatibility declared✓ done
Version & migration history declared✓ done
Operational runbook defined✓ done
Promotion gates defined✓ done
Failure-injection recipes defined✓ done