Health & Heartbeat
HealthHeartbeat monitors the liveness of all 97 production bots by polling each bot's internal health endpoint at a configurable interval.
What it does
HealthHeartbeat monitors the liveness of all 97 production bots by polling each bot's internal health endpoint at a configurable interval. If a bot misses missed_heartbeats_to_alert consecutive polls, HealthHeartbeat emits a page-severity alert and optionally triggers an auto-restart. It emits an OperationsReport after every sweep cycle summarising bot health across all layers. Internal-only — no external API surface.
Pipeline placement
Applies to: All 97 production bots across all layers
Why it matters
| If this fails | Consequence |
|---|---|
| A bot crashes silently without HealthHeartbeat running | The dead bot's layer is unguarded. Risk votes, kill-switch checks, or execution guards may stop firing, allowing uncontrolled order flow. |
| Auto-restart fires for a bot in a crash-loop | Repeated restarts mask a systemic failure and exhaust restart budgets. Without a circuit breaker, the governance layer itself degrades. |
| Alert not fired on missed heartbeats | On-call is not paged. The dead bot may go unnoticed for hours, accumulating unmonitored risk exposure. |
| HealthHeartbeat itself is not monitored | The watchdog is unwatched. A dead HealthHeartbeat means all 97 bots run without liveness supervision. |
Inputs
Polymarket inputs
| Input | Source | Required | Use |
|---|---|---|---|
| None — all inputs are internal | internal | optional | HealthHeartbeat does not consume any Polymarket API surface directly. |
Internal inputs
| Input | Source | Required | Use |
|---|---|---|---|
| Bot health endpoints — GET /internal/health/<slug> | All 97 production bots | required | Primary liveness signal. A 200 response within timeout_ms is a live heartbeat. |
| Bot registry — list of all bot slugs, layers, and restart configs | Config store | required | Defines the set of bots to monitor and their per-bot restart and alerting rules. |
| Restart executor — internal command bus topic for restart triggers | Process manager | optional | When auto_restart=true, HealthHeartbeat publishes a restart command to the process manager after missed_heartbeats_to_alert consecutive misses. |
Authority
What this bot is permitted to do
State
Readiness
General live
Status
live
Class
Governance Service
Default mode
general_live
Developer owner
Polytraders core — Governance pod
Capital impact
Indirect
Reason codes emitted
| Code | Severity | Meaning | Action |
|---|---|---|---|
| HEALTH_HEARTBEAT_SWEEP_COMPLETE | INFO | Full sweep of all registered bots completed; OperationsReport emitted. | No action — routine heartbeat. |
| HEALTH_HEARTBEAT_BOT_DOWN | WARN | A bot has exceeded the missed_heartbeats_to_alert threshold of consecutive missed polls. | Fire page-severity alert; trigger auto-restart if enabled. |
| HEALTH_HEARTBEAT_BOT_RECOVERED | INFO | A previously unhealthy bot returned a healthy response; miss_count reset to 0. | Emit recovery notification; no further action. |
| HEALTH_HEARTBEAT_AUTO_RESTART | WARN | CronRunner triggered an automatic restart for a bot that missed the heartbeat threshold. | Log restart; increment restart budget counter. |
| HEALTH_HEARTBEAT_RESTART_BUDGET_EXHAUSTED | WARN | A bot has been restarted the maximum number of times within the restart budget window without recovering. | Stop auto-restarting; escalate page to on-call. |
| HEALTH_HEARTBEAT_ENDPOINT_TIMEOUT | WARN | A bot's health endpoint did not respond within the configured timeout. | Treat as missed heartbeat; increment miss_count. |
| KILL_SWITCH_ACTIVE | WARN | KillSwitch is active; this is surfaced in the sweep report for context. | Continue monitoring all bots; do not suppress health checks. |
| HEALTH_HEARTBEAT_REGISTRY_STALE | WARN | The bot registry has not been refreshed from the config store within 5 minutes. | Retry registry fetch; alert if stale for > 10 minutes. |
Related bots in Governance & Ops
Used by
Reverse index — strategies that currently reference gov.health-heartbeat. If you change this bot's authority or reason codes, these strategies must re-pass shadow.
| Strategy | State | Activity |
|---|---|---|
| Fed Rates — surprise drift | frozen | last triggered 22m ago |
| AI Frontier — release-day taker | frozen | last triggered 29m ago |
| NBA props — line-shop | demo-wired | last triggered 36m ago |
Showing 3 of 3 · demo-wired ≠ production-live
Why this matters
Governance & Ops bots does NOT propose, approve, or block trades; only observes and reports. Understanding the authority boundary prevents misuse and makes promotion-gate reviews faster and more reliable. View raw spec JSON →