Polytraders Dev Guide
internal
v3 spine Phase 1 · Shared contracts 9 demo-wired · 0 shadow-ready · 0 production-live · 100 pending · 109 total 15/33 infra tasks the plan status board
HomeBy LayerDiscovery0.6 DuplicateMarketDetector

0.6 DuplicateMarketDetector

Discovery Signal Service Read-onlyRecommend PLANNED Spec started capital · Indirect P2 · Data normalisation pending stub

Detect semantically identical or dangerously overlapping Polymarket markets to prevent accidental correlated exposure and to surface cross-market arbitrage opportunities. Emits ObservationReports tagging each duplicate cluster with a similarity score.

v3 readiness

Docs27/27
donehow scored
Impl0/15
pendinghow scored
Backtest0/4
pendinghow scored
Runtime0/8
pendinghow scored

A bot is done when all four scores are. What does done mean?

1. Bot Identity

LayerDiscovery  Discovery
Bot classSignal Service
AuthorityRead-onlyRecommend
StatusPLANNED
ReadinessSpec started
Runs beforeStrategy OrderIntent generation
Runs afterMarketScanner and MarketQualityRanker
Applies toAll active Polymarket markets with similar question text or resolution criteria
Default modeshadow_only
User-visibleAdvanced details only
Developer ownerPolytraders core — Intelligence pod

2. Purpose

Detect semantically identical or dangerously overlapping Polymarket markets to prevent accidental correlated exposure and to surface cross-market arbitrage opportunities. Emits ObservationReports tagging each duplicate cluster with a similarity score.

3. Why This Bot Matters

  • Duplicate markets not detected

    A strategy may take independent positions on two semantically identical markets, creating unintended double exposure that bypasses position-size limits.

  • Near-duplicate neg-risk bundles missed

    Neg-risk outcome tokens across overlapping events can create correlated risk that is not visible from individual market inspection alone.

No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.

4. Required Polymarket Inputs

InputSourceRequired?Use
Market title, rules text, resolution source, and resolution dateGamma APIYesPrimary inputs for NLP-based semantic similarity computation.
Condition_id metadata and outcome-token listGamma APIYesIdentify neg-risk bundles that share outcome tokens across events.
Neg-risk flag and enableNegRisk statusGamma APINoApply enhanced overlap detection for neg-risk market groups.

5. Required Internal Inputs

InputSourceRequired?Use
MarketScanner candidate listdisc.marketscannerYesScope duplicate detection to tradable markets only.
KillSwitch active flagrisk.kill_switchYesSuppress emissions when KillSwitch is active.

6. Parameter Guide

ParameterDefaultWarningHardWhat it controls
similarity_threshold0.850.750.6Minimum cosine similarity score between market embeddings for a pair to be flagged as a duplicate.
max_cluster_size102050Maximum number of markets allowed in a single duplicate cluster before emitting a LARGE_CLUSTER_WARN.

7. Detailed Parameter Instructions

similarity_threshold

What it means

Minimum cosine similarity score between market embeddings for a pair to be flagged as a duplicate.

Default

{ "similarity_threshold": 0.85 }

Why this default matters

0.85 catches near-identical phrasings while avoiding false positives on related-but-distinct markets.

Threshold logic

ConditionAction
>= 0.85Flag as duplicate; emit ObservationReport
0.75–0.85Flag as potential overlap with WARN annotation
< 0.6Ignore — too dissimilar

Developer check

if (score < params.hard) skip_pair();

User-facing English

Markets are only flagged as overlapping when their questions and resolution criteria are highly similar.

max_cluster_size

What it means

Maximum number of markets allowed in a single duplicate cluster before emitting a LARGE_CLUSTER_WARN.

Default

{ "max_cluster_size": 10 }

Why this default matters

Clusters larger than 10 often indicate a data quality issue rather than genuine duplicates.

Threshold logic

ConditionAction
<= 10Normal cluster
10–20Large cluster — WARN
> 50LARGE_CLUSTER — hard flag; escalate for manual review

Developer check

if (cluster.size > params.hard) emit(LARGE_CLUSTER_WARN);

User-facing English

When many similar markets are found, the system flags the group for review to ensure quality.

8. Default Configuration

{
  "bot_id": "disc.duplicate_market_detector",
  "version": "0.1.0",
  "mode": "shadow_only",
  "defaults": {
    "similarity_threshold": 0.85,
    "require_manual_review": false,
    "publish_to": [
      "disc.opportunityqueue"
    ],
    "max_cluster_size": 10
  }
}

9. Implementation Flow

  1. On each detection cycle, fetch all active markets from Gamma API.
  2. Check KillSwitch; if active, halt emissions.
  3. Compute sentence embeddings for each market's (title + rules_text) using local embedding model.
  4. Build a pairwise cosine similarity matrix across all candidate markets.
  5. Cluster pairs with similarity >= similarity_threshold using Union-Find.
  6. For clusters with size > max_cluster_size, emit LARGE_CLUSTER_WARN and flag for manual review if require_manual_review=true.
  7. Emit one ObservationReport per duplicate cluster with market_ids, similarity_scores, and cluster_type (identical/overlap/negrisk_bundle).
  8. Log cycle summary: total_pairs_evaluated, clusters_found, large_clusters.

10. Reference Implementation

Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.

FUNCTION detectionCycle():
  ks = FETCH internal.killswitch.status
  IF ks.active: RETURN

  candidates = FETCH disc.marketscanner.latest_candidates()
  markets = FETCH gamma.GET('/markets?ids=' + join(candidates.ids))
  IF markets IS NULL:
    LOG ERROR 'Gamma API unavailable — halting detection cycle'
    RETURN

  // Compute embeddings
  embeddings = {}
  FOR market IN markets:
    text = market.question + ' ' + (market.rules_text OR '')
    embeddings[market.condition_id] = embed_model.encode(text)

  // Pairwise similarity + clustering
  uf = UnionFind(markets.ids)
  FOR i, j IN pairs(markets):
    score = cosine(embeddings[i], embeddings[j])
    IF score >= params.similarity_threshold.hard:
      uf.union(i, j)

  FOR cluster IN uf.clusters(min_size=2):
    max_score = MAX(cosine(a,b) FOR a,b IN pairs(cluster))
    cluster_type = 'identical' IF max_score >= 0.85 ELSE 'overlap'
    warnings = []
    IF len(cluster) > params.max_cluster_size.default:
      warnings.append('LARGE_CLUSTER_WARN')
    IF max_score < params.similarity_threshold.default:
      warnings.append('POTENTIAL_OVERLAP')
    EMIT ObservationReport(cluster_id, cluster_type, cluster.ids, max_score, warnings)

  LOG detection cycle summary

SDK calls used

  • gamma.GET('/markets?ids=<condition_id_list>')
  • embed_model.encode(text)
  • cosine(embedding_a, embedding_b)

Complexity: O(M²) where M = number of candidate markets; accelerated with FAISS ANN for large M

11. Wire Examples

Input — what arrives on the wire

Two near-duplicate markets from Gamma APIgamma_api

{
  "markets": [
    {
      "condition_id": "0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
      "question": "Will Candidate A win the Senate race?",
      "rules_text": "Resolves YES if Candidate A wins the Senate seat.",
      "resolution_date": "2026-11-04T00:00:00Z"
    },
    {
      "condition_id": "0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b",
      "question": "Will Candidate A win the Senate election?",
      "rules_text": "Resolves YES if Candidate A is elected to the Senate.",
      "resolution_date": "2026-11-04T00:00:00Z"
    }
  ]
}

Output — what the bot emits

ObservationReport — duplicate cluster detected

{
  "report_id": "0x1122334455667788990011223344556611223344556677889900112233445566",
  "bot_id": "disc.duplicate_market_detector",
  "cluster_id": "cluster-0001",
  "cluster_type": "identical",
  "market_ids": [
    "0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
    "0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b"
  ],
  "similarity_score": 0.93,
  "warnings": [],
  "detected_at_ms": 1746789000000
}

Reproduce locally

curl 'https://gamma-api.polymarket.com/markets?ids=0x7f8a...,0x8a9b...'

12. Decision Logic

APPROVE

Not applicable — DuplicateMarketDetector emits ObservationReports, not approvals.

RESHAPE_REQUIRED

Not applicable — read-only detection bot.

REJECT

Market pairs below the hard similarity floor are ignored; no report emitted.

WARNING_ONLY

Pairs in the warning band (0.75–0.85) are flagged with POTENTIAL_OVERLAP annotation.

13. Standard Decision Output

This bot returns a ObservationReport object. See ObservationReport schema.

{
  "report_id": "0x1122334455667788990011223344556611223344556677889900112233445566",
  "bot_id": "disc.duplicate_market_detector",
  "cluster_id": "cluster-0001",
  "cluster_type": "identical",
  "market_ids": [
    "0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
    "0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b"
  ],
  "similarity_score": 0.93,
  "warnings": [],
  "detected_at_ms": 1746789000000
}

14. Reason Codes

CodeSeverityMeaningActionUser-facing message
DUPLICATE_DETECTEDINFOTwo or more markets have been clustered as semantically identical.Emit ObservationReport with cluster details; downstream bots use this to suppress correlated positions.Two markets are asking essentially the same question — holding both could create unintended correlated exposure.
POTENTIAL_OVERLAPWARNMarkets are similar but not identical; similarity between warning and default threshold.Emit ObservationReport with POTENTIAL_OVERLAP annotation.These markets are closely related. Strategies will account for correlation when sizing positions.
LARGE_CLUSTER_WARNWARNCluster exceeds max_cluster_size, suggesting a data quality issue.Emit with LARGE_CLUSTER_WARN flag; escalate for manual review if require_manual_review=true.
STALE_MARKET_DATAHARD_REJECTGamma API or embedding model unavailable; detection cycle halted.Halt cycle; retry on next interval.
KILL_SWITCH_ACTIVEHARD_REJECTKillSwitch is active; all emissions suppressed.Return immediately.

15. Metrics & Logs

Metrics emitted

MetricTypeUnitLabelsMeaning
polytraders_disc_duplicatemarketdetector_clusters_found_totalcountercountcluster_typeTotal duplicate clusters detected per cycle, by type (identical/overlap/negrisk_bundle).
polytraders_disc_duplicatemarketdetector_reports_emitted_totalcountercountObservationReports emitted for duplicate clusters.
polytraders_disc_duplicatemarketdetector_similarity_scorehistogramratioDistribution of max similarity scores for detected clusters.

Alerts

AlertConditionSeverityRunbook
DuplicateMarketDetectorLargeClusterpolytraders_disc_duplicatemarketdetector_clusters_found_total{cluster_type='large'} > 0P2#runbook-duplicatemarketdetector-large-cluster
DuplicateMarketDetectorNoCyclesrate(polytraders_disc_duplicatemarketdetector_reports_emitted_total[30m]) == 0P3#runbook-duplicatemarketdetector-no-cycles

Dashboards

  • Grafana — Discovery / DuplicateMarketDetector cluster overview

Log levels

LevelWhat gets logged
DEBUGPer-cluster market_ids, similarity_score, and cluster_type.
INFOCycle summary: markets_evaluated, clusters_found.
WARNLarge cluster detected; embedding model slow.
ERRORGamma API unavailable; embedding model unavailable.

16. Developer Reporting

{
  "bot_id": "disc.duplicate_market_detector",
  "cycle": 7,
  "markets_evaluated": 47,
  "pairs_compared": 1081,
  "clusters_found": 3,
  "large_clusters": 0,
  "killswitch_active": false,
  "detected_at": "2026-05-09T11:30:00Z"
}

17. Plain-English Reporting

SituationUser-facing explanation
Two markets flagged as identicalThese two markets ask essentially the same question with the same resolution criteria. Holding positions in both would create unintended correlated exposure.
Markets flagged as overlappingThese markets are closely related but not identical. Strategies will account for the correlation when sizing positions across them.

18. Failure-Mode Block

main_failure_modeEmbedding model produces incorrect similarity scores for domain-specific terminology, causing genuine duplicates to be missed or distinct markets to be falsely clustered.
false_positive_riskTwo markets with similar surface phrasing but distinct resolution criteria (e.g. same candidate, different elections) may be incorrectly clustered as duplicates.
false_negative_riskMarkets with semantically identical meaning but very different wording may fall below the similarity threshold and escape detection.
safe_fallbackIf embedding model or Gamma API is unavailable, halt detection cycle and emit STALE_MARKET_DATA rather than serving stale clusters.
required_dependenciesGamma API market list with title and rules text, Local sentence-embedding model, MarketScanner candidate list, KillSwitch active flag

19. Failure-Injection Recipes

ScenarioHow to injectExpected behaviourRecovery
EMBEDDING_MODEL_DOWNKill local embedding model processAutomatic when embedding model restarts.
LARGE_CLUSTER_DETECTEDInject 60 markets with near-identical titlesManual review; adjust similarity_threshold if needed.
KILL_SWITCH_ONSet killswitch.active=trueEmissions resume after KillSwitch reset.

20. State & Persistence

Cold-start recovery

On cold start, embeddings are recomputed from scratch on first cycle.

21. Concurrency & Idempotency

AspectSpecification
Execution modelsingle-threaded async loop with batched embedding inference
Max in-flight1
Idempotency keydetection_cycle_id
Per-call timeout (ms)15000
Backpressure strategydrop newest
Locking / mutual exclusionnone

22. Dependencies

Depends on (must run first)

BotWhyContract
disc.marketscannerScopes detection to tradable markets only.Expects active candidate list with condition_ids.
risk.kill_switchKillSwitch suppresses all emissions.If active, no ObservationReports emitted.

Emits to (downstream consumers)

BotWhyContract
disc.opportunityqueueOpportunityQueue uses duplicate cluster reports to suppress correlated position double-ups.ObservationReport includes cluster_id, market_ids, similarity_score.

Sibling bots (same OrderIntent)

External services

ServiceEndpointSLA assumedOn failure
Gamma APIhttps://gamma-api.polymarket.com99.9% / 500ms p99Halt cycle; retry next interval.
Local sentence-embedding modellocalhost:8080/embed99.9% / 100ms p99Halt cycle; emit STALE_MARKET_DATA; retry next interval.

23. Security Surfaces

Abuse vectors considered

  • Gamma API returning crafted market text designed to force a false duplicate cluster
  • Embedding model poisoning via crafted input text

Mitigations

  • Similarity threshold prevents low-confidence clusters from propagating
  • Large-cluster warning and require_manual_review flag limit blast radius of false clusters

24. Polymarket V2 Compatibility

AspectValue
CLOB versionv2
Collateral assetpUSD
EIP-712 Exchange domain version2
Aware of builderCode fieldno
Aware of negative-risk marketsyes
Multi-chain readyno
SDK usedpy-clob-client-v2
Settlement contractCTFExchangeV2
NotesUses Gamma API outcome-token lists and enableNegRisk flag to detect correlated neg-risk bundles as a specialised cluster type alongside standard duplicate detection.

API surfaces declared

gammadatainternal

Networks supported

polygon

25. Versioning & Migration

FieldValue
spec2.0.0
implementation0.1.0
schema2
releasedNone
planned_releaseQ4-2026

Migration history

DateFromToReasonAction taken
2026-04-28n/av2-specSpec drafted post-CLOB-V2 cutover; bot not yet implementedDesigned against V2 schema (pUSD, builder codes, V2 EIP-712 domain)

26. Acceptance Tests

Unit Tests

TestSetupExpected result
Two identical-text markets cluster at similarity >= 0.85Two markets with identical question textcluster_type='identical'; ObservationReport emitted with both market_ids
Dissimilar markets below hard floor not clusteredsimilarity_score=0.55, hard=0.6No cluster emitted
Cluster exceeding max_cluster_size emits LARGE_CLUSTER_WARNcluster.size=55, max_cluster_size hard=50LARGE_CLUSTER_WARN emitted; cluster flagged for review

Integration Tests

TestExpected result
Duplicate cluster detected and forwarded to OpportunityQueue for position suppressionOpportunityQueue uses cluster ObservationReport to suppress double-up on duplicate markets
Embedding model unavailability halts cycle with STALE_MARKET_DATANo ObservationReports emitted; next cycle resumes when model is available

Property Tests

PropertyRequired behaviour
Every cluster contains at least 2 market_idsAlways true — singleton clusters are never emitted
No emission when KillSwitch is activeAlways true

27. Operational Runbook

DuplicateMarketDetector incidents are typically embedding model failures or large false clusters. Bot is read-only; incidents do not affect active positions.

On-call actions

AlertFirst stepDiagnosisMitigationEscalate to
DuplicateMarketDetectorLargeCluster
DuplicateMarketDetectorNoCycles

Manual overrides

Healthcheck

GET /internal/health/duplicatemarketdetector → green if Last detection cycle completed within 2× cycle interval; embedding model reachable.; red if No cycle in 2× interval or embedding model unreachable for >5 minutes.

28. Promotion Gates

A bot does not advance to the next readiness state until every gate below is green. Gates are observable from production data — no subjective sign-off.

Promote to Shadow

GateHow measuredThreshold
Identical-text market pairs cluster with similarity >= 0.95 in test suiteUnit test suite100% pass

Promote to Limited live

GateHow measuredThreshold
False-positive rate < 5% over 48h shadow run (spot-checked sample of 100 clusters)Manual review< 5 false positives per 100

Promote to General live

GateHow measuredThreshold
Zero LARGE_CLUSTER_WARN events during normal operation over 7 daysAlert history0 firings

29. Developer Checklist

Ready-to-ship score: 27/27 sections complete · 100%

RequirementStatus
Purpose defined✓ done
Required inputs listed✓ done
Parameters defined✓ done
Defaults defined✓ done
Warning thresholds defined✓ done
Hard thresholds defined✓ done
Safe fallback defined✓ done
Structured output defined✓ done
Developer log defined✓ done
Plain-English explanation✓ done
Unit tests defined✓ done
Integration tests defined✓ done
Property tests defined✓ done
Failure-mode block complete✓ done
Reference implementation pseudocode✓ done
Wire examples (input + output)✓ done
Reason codes listed✓ done
Metrics & logs defined✓ done
State & persistence defined✓ done
Concurrency & idempotency defined✓ done
Dependencies declared✓ done
Security surfaces declared✓ done
Polymarket V2 compatibility declared✓ done
Version & migration history declared✓ done
Operational runbook defined✓ done
Promotion gates defined✓ done
Failure-injection recipes defined✓ done