1. Bot Identity
| Layer | Discovery Discovery |
|---|
| Bot class | Signal Service |
|---|
| Authority | Read-onlyRecommend |
|---|
| Status | PLANNED |
|---|
| Readiness | Spec started |
|---|
| Runs before | Strategy OrderIntent generation |
|---|
| Runs after | MarketScanner and MarketQualityRanker |
|---|
| Applies to | All active Polymarket markets with similar question text or resolution criteria |
|---|
| Default mode | shadow_only |
|---|
| User-visible | Advanced details only |
|---|
| Developer owner | Polytraders core — Intelligence pod |
|---|
2. Purpose
Detect semantically identical or dangerously overlapping Polymarket markets to prevent accidental correlated exposure and to surface cross-market arbitrage opportunities. Emits ObservationReports tagging each duplicate cluster with a similarity score.
3. Why This Bot Matters
Duplicate markets not detected
A strategy may take independent positions on two semantically identical markets, creating unintended double exposure that bypasses position-size limits.
Near-duplicate neg-risk bundles missed
Neg-risk outcome tokens across overlapping events can create correlated risk that is not visible from individual market inspection alone.
No worked examples on this bot yet. Worked examples are optional but strongly recommended — they turn an abstract failure mode into something a developer can verify in a fixture.
6. Parameter Guide
| Parameter | Default | Warning | Hard | What it controls |
|---|
| similarity_threshold | 0.85 | 0.75 | 0.6 | Minimum cosine similarity score between market embeddings for a pair to be flagged as a duplicate. |
| max_cluster_size | 10 | 20 | 50 | Maximum number of markets allowed in a single duplicate cluster before emitting a LARGE_CLUSTER_WARN. |
7. Detailed Parameter Instructions
similarity_threshold
What it means
Minimum cosine similarity score between market embeddings for a pair to be flagged as a duplicate.
Default
{ "similarity_threshold": 0.85 }
Why this default matters
0.85 catches near-identical phrasings while avoiding false positives on related-but-distinct markets.
Threshold logic
| Condition | Action |
|---|
| >= 0.85 | Flag as duplicate; emit ObservationReport |
| 0.75–0.85 | Flag as potential overlap with WARN annotation |
| < 0.6 | Ignore — too dissimilar |
Developer check
if (score < params.hard) skip_pair();
User-facing English
Markets are only flagged as overlapping when their questions and resolution criteria are highly similar.
max_cluster_size
What it means
Maximum number of markets allowed in a single duplicate cluster before emitting a LARGE_CLUSTER_WARN.
Default
{ "max_cluster_size": 10 }
Why this default matters
Clusters larger than 10 often indicate a data quality issue rather than genuine duplicates.
Threshold logic
| Condition | Action |
|---|
| <= 10 | Normal cluster |
| 10–20 | Large cluster — WARN |
| > 50 | LARGE_CLUSTER — hard flag; escalate for manual review |
Developer check
if (cluster.size > params.hard) emit(LARGE_CLUSTER_WARN);
User-facing English
When many similar markets are found, the system flags the group for review to ensure quality.
8. Default Configuration
{
"bot_id": "disc.duplicate_market_detector",
"version": "0.1.0",
"mode": "shadow_only",
"defaults": {
"similarity_threshold": 0.85,
"require_manual_review": false,
"publish_to": [
"disc.opportunityqueue"
],
"max_cluster_size": 10
}
}
9. Implementation Flow
- On each detection cycle, fetch all active markets from Gamma API.
- Check KillSwitch; if active, halt emissions.
- Compute sentence embeddings for each market's (title + rules_text) using local embedding model.
- Build a pairwise cosine similarity matrix across all candidate markets.
- Cluster pairs with similarity >= similarity_threshold using Union-Find.
- For clusters with size > max_cluster_size, emit LARGE_CLUSTER_WARN and flag for manual review if require_manual_review=true.
- Emit one ObservationReport per duplicate cluster with market_ids, similarity_scores, and cluster_type (identical/overlap/negrisk_bundle).
- Log cycle summary: total_pairs_evaluated, clusters_found, large_clusters.
10. Reference Implementation
Pseudocode is language-agnostic. FETCH = read input. EMIT = produce output. IF/THEN/ELSE = decision. Translate directly to TypeScript, Python, Go, or Rust.
FUNCTION detectionCycle():
ks = FETCH internal.killswitch.status
IF ks.active: RETURN
candidates = FETCH disc.marketscanner.latest_candidates()
markets = FETCH gamma.GET('/markets?ids=' + join(candidates.ids))
IF markets IS NULL:
LOG ERROR 'Gamma API unavailable — halting detection cycle'
RETURN
// Compute embeddings
embeddings = {}
FOR market IN markets:
text = market.question + ' ' + (market.rules_text OR '')
embeddings[market.condition_id] = embed_model.encode(text)
// Pairwise similarity + clustering
uf = UnionFind(markets.ids)
FOR i, j IN pairs(markets):
score = cosine(embeddings[i], embeddings[j])
IF score >= params.similarity_threshold.hard:
uf.union(i, j)
FOR cluster IN uf.clusters(min_size=2):
max_score = MAX(cosine(a,b) FOR a,b IN pairs(cluster))
cluster_type = 'identical' IF max_score >= 0.85 ELSE 'overlap'
warnings = []
IF len(cluster) > params.max_cluster_size.default:
warnings.append('LARGE_CLUSTER_WARN')
IF max_score < params.similarity_threshold.default:
warnings.append('POTENTIAL_OVERLAP')
EMIT ObservationReport(cluster_id, cluster_type, cluster.ids, max_score, warnings)
LOG detection cycle summary
SDK calls used
gamma.GET('/markets?ids=<condition_id_list>')embed_model.encode(text)cosine(embedding_a, embedding_b)
Complexity: O(M²) where M = number of candidate markets; accelerated with FAISS ANN for large M
11. Wire Examples
Input — what arrives on the wire
Two near-duplicate markets from Gamma API — gamma_api
{
"markets": [
{
"condition_id": "0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
"question": "Will Candidate A win the Senate race?",
"rules_text": "Resolves YES if Candidate A wins the Senate seat.",
"resolution_date": "2026-11-04T00:00:00Z"
},
{
"condition_id": "0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b",
"question": "Will Candidate A win the Senate election?",
"rules_text": "Resolves YES if Candidate A is elected to the Senate.",
"resolution_date": "2026-11-04T00:00:00Z"
}
]
}
Output — what the bot emits
ObservationReport — duplicate cluster detected
{
"report_id": "0x1122334455667788990011223344556611223344556677889900112233445566",
"bot_id": "disc.duplicate_market_detector",
"cluster_id": "cluster-0001",
"cluster_type": "identical",
"market_ids": [
"0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
"0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b"
],
"similarity_score": 0.93,
"warnings": [],
"detected_at_ms": 1746789000000
}
Reproduce locally
curl 'https://gamma-api.polymarket.com/markets?ids=0x7f8a...,0x8a9b...'
12. Decision Logic
APPROVE
Not applicable — DuplicateMarketDetector emits ObservationReports, not approvals.
RESHAPE_REQUIRED
Not applicable — read-only detection bot.
REJECT
Market pairs below the hard similarity floor are ignored; no report emitted.
WARNING_ONLY
Pairs in the warning band (0.75–0.85) are flagged with POTENTIAL_OVERLAP annotation.
13. Standard Decision Output
This bot returns a ObservationReport object. See ObservationReport schema.
{
"report_id": "0x1122334455667788990011223344556611223344556677889900112233445566",
"bot_id": "disc.duplicate_market_detector",
"cluster_id": "cluster-0001",
"cluster_type": "identical",
"market_ids": [
"0x7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a",
"0x8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b"
],
"similarity_score": 0.93,
"warnings": [],
"detected_at_ms": 1746789000000
}
14. Reason Codes
| Code | Severity | Meaning | Action | User-facing message |
|---|
DUPLICATE_DETECTED | INFO | Two or more markets have been clustered as semantically identical. | Emit ObservationReport with cluster details; downstream bots use this to suppress correlated positions. | Two markets are asking essentially the same question — holding both could create unintended correlated exposure. |
POTENTIAL_OVERLAP | WARN | Markets are similar but not identical; similarity between warning and default threshold. | Emit ObservationReport with POTENTIAL_OVERLAP annotation. | These markets are closely related. Strategies will account for correlation when sizing positions. |
LARGE_CLUSTER_WARN | WARN | Cluster exceeds max_cluster_size, suggesting a data quality issue. | Emit with LARGE_CLUSTER_WARN flag; escalate for manual review if require_manual_review=true. | |
STALE_MARKET_DATA | HARD_REJECT | Gamma API or embedding model unavailable; detection cycle halted. | Halt cycle; retry on next interval. | |
KILL_SWITCH_ACTIVE | HARD_REJECT | KillSwitch is active; all emissions suppressed. | Return immediately. | |
15. Metrics & Logs
Metrics emitted
| Metric | Type | Unit | Labels | Meaning |
|---|
polytraders_disc_duplicatemarketdetector_clusters_found_total | counter | count | cluster_type | Total duplicate clusters detected per cycle, by type (identical/overlap/negrisk_bundle). |
polytraders_disc_duplicatemarketdetector_reports_emitted_total | counter | count | | ObservationReports emitted for duplicate clusters. |
polytraders_disc_duplicatemarketdetector_similarity_score | histogram | ratio | | Distribution of max similarity scores for detected clusters. |
Alerts
| Alert | Condition | Severity | Runbook |
|---|
DuplicateMarketDetectorLargeCluster | polytraders_disc_duplicatemarketdetector_clusters_found_total{cluster_type='large'} > 0 | P2 | #runbook-duplicatemarketdetector-large-cluster |
DuplicateMarketDetectorNoCycles | rate(polytraders_disc_duplicatemarketdetector_reports_emitted_total[30m]) == 0 | P3 | #runbook-duplicatemarketdetector-no-cycles |
Dashboards
- Grafana — Discovery / DuplicateMarketDetector cluster overview
Log levels
| Level | What gets logged |
|---|
| DEBUG | Per-cluster market_ids, similarity_score, and cluster_type. |
| INFO | Cycle summary: markets_evaluated, clusters_found. |
| WARN | Large cluster detected; embedding model slow. |
| ERROR | Gamma API unavailable; embedding model unavailable. |
16. Developer Reporting
{
"bot_id": "disc.duplicate_market_detector",
"cycle": 7,
"markets_evaluated": 47,
"pairs_compared": 1081,
"clusters_found": 3,
"large_clusters": 0,
"killswitch_active": false,
"detected_at": "2026-05-09T11:30:00Z"
}
17. Plain-English Reporting
| Situation | User-facing explanation |
|---|
| Two markets flagged as identical | These two markets ask essentially the same question with the same resolution criteria. Holding positions in both would create unintended correlated exposure. |
| Markets flagged as overlapping | These markets are closely related but not identical. Strategies will account for the correlation when sizing positions across them. |
18. Failure-Mode Block
| main_failure_mode | Embedding model produces incorrect similarity scores for domain-specific terminology, causing genuine duplicates to be missed or distinct markets to be falsely clustered. |
|---|
| false_positive_risk | Two markets with similar surface phrasing but distinct resolution criteria (e.g. same candidate, different elections) may be incorrectly clustered as duplicates. |
|---|
| false_negative_risk | Markets with semantically identical meaning but very different wording may fall below the similarity threshold and escape detection. |
|---|
| safe_fallback | If embedding model or Gamma API is unavailable, halt detection cycle and emit STALE_MARKET_DATA rather than serving stale clusters. |
|---|
| required_dependencies | Gamma API market list with title and rules text, Local sentence-embedding model, MarketScanner candidate list, KillSwitch active flag |
|---|
19. Failure-Injection Recipes
| Scenario | How to inject | Expected behaviour | Recovery |
|---|
EMBEDDING_MODEL_DOWN | Kill local embedding model process | | Automatic when embedding model restarts. |
LARGE_CLUSTER_DETECTED | Inject 60 markets with near-identical titles | | Manual review; adjust similarity_threshold if needed. |
KILL_SWITCH_ON | Set killswitch.active=true | | Emissions resume after KillSwitch reset. |
20. State & Persistence
Cold-start recovery
On cold start, embeddings are recomputed from scratch on first cycle.
21. Concurrency & Idempotency
| Aspect | Specification |
|---|
| Execution model | single-threaded async loop with batched embedding inference |
| Max in-flight | 1 |
| Idempotency key | detection_cycle_id |
| Per-call timeout (ms) | 15000 |
| Backpressure strategy | drop newest |
| Locking / mutual exclusion | none |
22. Dependencies
Depends on (must run first)
Emits to (downstream consumers)
| Bot | Why | Contract |
|---|
| disc.opportunityqueue | OpportunityQueue uses duplicate cluster reports to suppress correlated position double-ups. | ObservationReport includes cluster_id, market_ids, similarity_score. |
Sibling bots (same OrderIntent)
External services
| Service | Endpoint | SLA assumed | On failure |
|---|
| Gamma API | https://gamma-api.polymarket.com | 99.9% / 500ms p99 | Halt cycle; retry next interval. |
| Local sentence-embedding model | localhost:8080/embed | 99.9% / 100ms p99 | Halt cycle; emit STALE_MARKET_DATA; retry next interval. |
23. Security Surfaces
Abuse vectors considered
- Gamma API returning crafted market text designed to force a false duplicate cluster
- Embedding model poisoning via crafted input text
Mitigations
- Similarity threshold prevents low-confidence clusters from propagating
- Large-cluster warning and require_manual_review flag limit blast radius of false clusters
24. Polymarket V2 Compatibility
| Aspect | Value |
|---|
| CLOB version | v2 |
| Collateral asset | pUSD |
| EIP-712 Exchange domain version | 2 |
| Aware of builderCode field | no |
| Aware of negative-risk markets | yes |
| Multi-chain ready | no |
| SDK used | py-clob-client-v2 |
| Settlement contract | CTFExchangeV2 |
| Notes | Uses Gamma API outcome-token lists and enableNegRisk flag to detect correlated neg-risk bundles as a specialised cluster type alongside standard duplicate detection. |
API surfaces declared
gammadatainternal
Networks supported
polygon
25. Versioning & Migration
| Field | Value |
|---|
| spec | 2.0.0 |
| implementation | 0.1.0 |
| schema | 2 |
| released | None |
| planned_release | Q4-2026 |
Migration history
| Date | From | To | Reason | Action taken |
|---|
| 2026-04-28 | n/a | v2-spec | Spec drafted post-CLOB-V2 cutover; bot not yet implemented | Designed against V2 schema (pUSD, builder codes, V2 EIP-712 domain) |
26. Acceptance Tests
Unit Tests
| Test | Setup | Expected result |
|---|
| Two identical-text markets cluster at similarity >= 0.85 | Two markets with identical question text | cluster_type='identical'; ObservationReport emitted with both market_ids |
| Dissimilar markets below hard floor not clustered | similarity_score=0.55, hard=0.6 | No cluster emitted |
| Cluster exceeding max_cluster_size emits LARGE_CLUSTER_WARN | cluster.size=55, max_cluster_size hard=50 | LARGE_CLUSTER_WARN emitted; cluster flagged for review |
Integration Tests
| Test | Expected result |
|---|
| Duplicate cluster detected and forwarded to OpportunityQueue for position suppression | OpportunityQueue uses cluster ObservationReport to suppress double-up on duplicate markets |
| Embedding model unavailability halts cycle with STALE_MARKET_DATA | No ObservationReports emitted; next cycle resumes when model is available |
Property Tests
| Property | Required behaviour |
|---|
| Every cluster contains at least 2 market_ids | Always true — singleton clusters are never emitted |
| No emission when KillSwitch is active | Always true |
27. Operational Runbook
DuplicateMarketDetector incidents are typically embedding model failures or large false clusters. Bot is read-only; incidents do not affect active positions.
On-call actions
| Alert | First step | Diagnosis | Mitigation | Escalate to |
|---|
DuplicateMarketDetectorLargeCluster | | | | |
DuplicateMarketDetectorNoCycles | | | | |
Manual overrides
Healthcheck
GET /internal/health/duplicatemarketdetector → green if Last detection cycle completed within 2× cycle interval; embedding model reachable.; red if No cycle in 2× interval or embedding model unreachable for >5 minutes.
29. Developer Checklist
Ready-to-ship score: 27/27 sections complete · 100%
| Requirement | Status |
|---|
| Purpose defined | ✓ done |
| Required inputs listed | ✓ done |
| Parameters defined | ✓ done |
| Defaults defined | ✓ done |
| Warning thresholds defined | ✓ done |
| Hard thresholds defined | ✓ done |
| Safe fallback defined | ✓ done |
| Structured output defined | ✓ done |
| Developer log defined | ✓ done |
| Plain-English explanation | ✓ done |
| Unit tests defined | ✓ done |
| Integration tests defined | ✓ done |
| Property tests defined | ✓ done |
| Failure-mode block complete | ✓ done |
| Reference implementation pseudocode | ✓ done |
| Wire examples (input + output) | ✓ done |
| Reason codes listed | ✓ done |
| Metrics & logs defined | ✓ done |
| State & persistence defined | ✓ done |
| Concurrency & idempotency defined | ✓ done |
| Dependencies declared | ✓ done |
| Security surfaces declared | ✓ done |
| Polymarket V2 compatibility declared | ✓ done |
| Version & migration history declared | ✓ done |
| Operational runbook defined | ✓ done |
| Promotion gates defined | ✓ done |
| Failure-injection recipes defined | ✓ done |