Playbooks / Event Orchestration
Automate incident response with configurable playbooks. Trigger them automatically on check status changes or inbound events, execute actions sequentially or in parallel, pause for manual human steps, and call outbound webhooks for auto-remediation.
Overview
Playbooks wire together existing building blocks — escalation policies, war rooms, status pages, notifications, and Jira tickets — into repeatable automation flows. Each playbook has a trigger (with optional conditions), steps (16 action types available), and a target incident (check or inbound).
- Hybrid model — auto-trigger + automated actions + manual human steps + manual invocation
- 16 action types — from setting priority to firing outbound webhooks
- Parallel groups — run independent steps concurrently
- Per-step conditions — skip steps based on incident attributes
- Suppress mode — playbook takes full control of the notification flow
- Live execution timeline — SSE-powered progress view
Creating a Playbook
Navigate to Operate > Playbooks in the sidebar and click New Playbook. The editor has three sections:
1. Basics
| Field | Description |
|---|---|
| Name | Short, descriptive name (e.g. "Auto-page P1 database incidents") |
| Description | Optional longer explanation of what the playbook does |
| Active | Toggle off to disable without deleting |
2. Trigger
Choose one of four trigger types:
| Trigger | Fires when |
|---|---|
check_status_change | A check transitions between UP / DOWN / DEGRADED / LATE |
inbound_incident_created | A new inbound incident is ingested via the Events API |
inbound_status_change | An inbound incident is acknowledged or resolved |
manual | Only runs when triggered by a user via "Run Playbook" button |
Trigger Conditions (AND / OR)
Restrict when the playbook fires. Available fields depend on the trigger type. Operators: equals, not_equals, in, contains, regex, between, exists.
{
"logic": "AND",
"conditions": [
{ "field": "severity", "operator": "equals", "value": "critical" },
{ "field": "service_tier", "operator": "in", "value": ["P1", "P2"] },
{ "field": "time_of_day", "operator": "between", "value": ["22:00", "06:00"] }
]
}Tip
Suppress default notifications
When enabled, the playbook takes full control of the incident response. Default notification channels and escalation policies are skipped for matching incidents — the playbook's steps handle everything.
Warning
run_escalation, add_responders, or send_custom_notification) — otherwise alerts will be silently dropped.3. Steps
Drag & drop steps to reorder. Each step has a name, action type, configuration, and optional conditions, parallel group, and timeout. Steps run top-to-bottom.
Action Types
Notification actions
| Action | What it does |
|---|---|
add_responders | Notify step-0 targets of an escalation policy via email |
run_escalation | Fire a full escalation policy step (user/schedule/channel targets) |
send_custom_notification | Send a rendered message to a specific notification channel |
notify_subscribers | Email status page subscribers about a state change |
Incident management actions
| Action | What it does |
|---|---|
set_priority | Set incident priority (P1–P5) |
set_severity | Set incident severity (critical/error/warning/info) |
auto_ack | Acknowledge the incident programmatically |
auto_resolve | Mark the incident as resolved |
title_enrichment | Append context to the incident title (e.g. recent deploy revision or feature flag) |
add_links | Attach runbook / dashboard URLs to the incident timeline |
External integrations
| Action | What it does |
|---|---|
setup_war_room | Create a war room (Jitsi, Google Meet, Slack Huddle, Discord, MS Teams) |
create_jira_ticket | Open a Jira issue via your configured Jira channel |
create_slack_channel | Create a dedicated Slack channel for the incident (via Bot API) |
update_status_page | Publish a message on a public status page |
outbound_webhook | Call an external HTTP endpoint (POST/GET/PUT/…) with a templated body and headers — for auto-rollback, feature-flag disable, diagnostics collection, etc. |
Manual step
Use manual_step to pause execution until a human ticks a checkbox in the execution detail view. Perfect for "verify database is responding" or "confirm rollback succeeded" checks before continuing automation.
Outbound Webhooks
The outbound_webhook action turns playbooks into an auto-remediation engine. Typical uses:
- Auto-rollback — POST to your deploy API with the previous revision from a Change Event
- Disable feature flag — POST to LaunchDarkly / Flagsmith to kill a broken flag
- Scale up — POST to your cloud provider to add capacity
- Collect diagnostics — trigger a remote script that gathers logs and attaches them to the incident
- Mirror to PagerDuty — for teams migrating between platforms
Configuration
| Field | Description |
|---|---|
| URL | Endpoint to call |
| Method | POST, GET, PUT, PATCH, DELETE |
| Headers | JSON object of headers (supports secret references) |
| Body template | String with {{variable.path}} placeholders (see Templates) |
| Retry count | 0–5 attempts with exponential backoff (5s → 15s → 45s) |
| Timeout | Per-request timeout in seconds (max 120) |
The response (status code + body truncated to 1KB) is logged in the execution detail so you can audit what happened. On failure after all retries, the step is marked failed but subsequent steps still run.
Template Variables
Use {{variable.path}} syntax in any configurable text field (body templates, messages, Jira summaries, Slack channel names, …). Missing values resolve to an empty string — no errors.
| Variable | Available for |
|---|---|
{{check.name}}, {{check.target}}, {{check.status}} | Check status triggers |
{{incident.title}}, {{incident.severity}}, {{incident.priority}} | All triggers |
{{service.name}}, {{service.tier}} | Checks bound to a Service |
{{playbook.name}}, {{execution.id}} | Always available |
{{org.name}}, {{org.slug}} | Always available |
Parallel Groups
By default, steps run sequentially. To execute several steps in parallel, assign them the same parallel group (Group A / B / C / D). Steps in the same group start at the same time using a thread pool (max 5 concurrent workers per group).
Tip
create_jira_ticket + create_slack_channel + setup_war_room — none depend on each other and running them sequentially just wastes seconds when every second counts.Manual Steps & Resume
When the orchestrator encounters a step marked is_manual, execution pauses with status waiting_for_human. The step appears in the execution timeline with a "Mark Complete" button.
When a user clicks the button, the playbook resumes from the next step — a new task continues where the previous one left off, preserving the full step history.
Running a Playbook
Automatic trigger
When an incident fires (check status changes or inbound event received), matching active playbooks are queued automatically. An SSE event playbook.started is published so the UI updates in real time.
Manual trigger
On a check detail page, scroll to the Playbooks section and click Run Playbook. Pick a playbook and it will be executed against the current incident. Manual triggers are recorded in the execution history with your user ID.
Execution History
Every run is recorded as a PlaybookExecution with full step-by-step results (status, timing, output, errors). Access via Playbooks > … menu > History or click any execution from a check detail page.
The execution detail page shows a live timeline updated via Server-Sent Events — no manual refresh needed.
Example: Auto-rollback on failed deploy
Put it all together. The scenario:
- CI/CD pushes a ChangeEvent of type
deployment - 2 minutes later, an HTTP check starts returning 500s
- A playbook matches (trigger: check_status_change, condition: has_recent_change_event == true), and:
- Sets priority to P1
- In parallel: creates a war room, opens a Jira ticket, sends a custom Slack notification
- Calls an outbound webhook to roll back the deploy with the previous revision from the change event
- Pauses on a manual step: "Verify rollback succeeded in production"
- After the responder confirms,
auto_resolvecloses the incident
API
Playbooks are fully manageable via the REST API. See the API Reference for details.
GET /api/organizations/{org_id}/playbooksPOST /api/organizations/{org_id}/playbooksPUT /api/organizations/{org_id}/playbooks/{id}POST /api/organizations/{org_id}/playbooks/{id}/executeGET /api/organizations/{org_id}/playbook-executionsPOST /api/organizations/{org_id}/playbook-executions/{id}/steps/{step}/complete