Skip to content

Configuration

Application configuration is handled using pydantic-settings.

The bot reads environment variables from the system environment and from a .env file (dotenv), in that order.

All other settings are read from a config.yaml file. Almost every setting has a sensible default, so you can run the application without creating one — but you will want one as soon as you need to enable integrations or change any defaults.

A fully annotated config.yaml.sample is included in the repository root and covers every available option.

Please review the integrations documentation for additional information on enabling and configuring integrations.

Configurable Parameters

API

The application exposes a lightweight API that serves embeddable incident status widgets and a health check endpoint (GET /health). Widget routes are available under /api/v1/widgets.

To enable it, set the following in config.yaml:

api:
  enabled: true

Digest Channel

The digest channel is where updates are sent regarding all incidents managed by the bot. The channel is #incidents by default.

# default
digest_channel: incidents

Pinned Images

Pinning images to incident channels is enabled by default.

enable_pinned_images: false

Note

The API method used for pinning (files.sharedPublicURL) is restricted on free Slack plans.

Custom quick-access buttons can be added to incident messages:

links:
  - title: Runbooks
    url: https://runbooks.example.com
  - title: Monitoring
    url: https://grafana.example.com

Options

options:
  # Messages posted to every new incident channel on creation.
  additional_welcome_messages:
    - message: "Welcome! Please review the runbook before taking action."
      pin: true  # default: false

  # Prefix for incident channel names — e.g. "inc" → #inc-2024-01-15-outage
  channel_name_prefix: inc  # default

  # Date format token used in channel names (dayjs / moment style).
  channel_name_date_format: YYYY-MM-DD  # default

  # Prepend the date so channels sort chronologically.
  channel_name_use_date_prefix: false  # default

  # A static meeting URL attached to every incident.
  # Leave unset when using the Zoom integration.
  meeting_link: https://meet.example.com/oncall

  # Pin the meeting link inside the incident channel on creation.
  pin_meeting_link_to_channel: false  # default

  # Number of recent incidents shown on the Slack app home page.
  show_most_recent_incidents_app_home_limit: 5  # default

  # Items per page in paginated Slack list views.
  slack_items_pagination_per_page: 5  # default

  # Application timezone. Used for scheduling and all displayed timestamps.
  timezone: UTC  # default

  # Post status updates directly to the digest channel instead of threading them.
  updates_in_threads: false  # default

  # User-agents to suppress from API access logs (e.g. to silence health probes).
  skip_logs_for_user_agent:
    - kube-probe

Platform

The platform field controls which chat platform the bot connects to. Valid values are slack (default) and matrix.

platform: slack  # or matrix

Slack

When platform: slack, the following environment variables are required:

  • SLACK_APP_TOKEN — App-level token for websocket communication.
  • SLACK_BOT_TOKEN — Bot-scoped OAuth token.
  • SLACK_USER_TOKEN — User-scoped OAuth token.

Matrix

When platform: matrix, the following environment variables are required:

  • MATRIX_HOMESERVER — URL of your Matrix homeserver (e.g. https://matrix.example.com).
  • MATRIX_USER_ID — Full Matrix user ID for the bot (e.g. @incidentbot:example.com).
  • MATRIX_ACCESS_TOKEN — Access token for the bot account.
  • MATRIX_DIGEST_ROOM_ID — Room ID to use as the incident digest room.

Optional Matrix variables:

  • MATRIX_DEVICE_ID — Device ID for the bot session (default: INCIDENTBOT).
  • MATRIX_WIDGET_BASE_URL — Base URL to use for embedded incident widgets.

Roles

Define the roles participants can claim during an incident. Exactly one role must have is_lead: true — this is the Incident Commander role.

Note

Omit this section entirely to use the four built-in default roles.

roles:
  incident_commander:
    description: "Decision maker. Delegates tasks and drives resolution."
    is_lead: true
  scribe:
    description: "Maintains the incident timeline and tracks follow-up items."
  subject_matter_expert:
    description: "Domain expert for a specific service or component."
  communications_liaison:
    description: "Handles customer-facing communication during the incident."

Severities

Note

Omit this section entirely to use the four built-in default severities.

severities:
  sev1: "Critical  major impact on all users. All-hands response required."
  sev2: "High  significant degradation for a large portion of users."
  sev3: "Moderate  minor degradation; worth coordinating but not critical."
  sev4: "Investigating  potential issue, no confirmed impact yet. Default."

Statuses

Note

Omit this section entirely to use the four built-in default statuses. If you define custom statuses, exactly one must have initial: true and one must have final: true.

statuses:
  investigating:
    initial: true
  identified: {}
  monitoring: {}
  resolved:
    final: true

Slash Command

The bot's default slash command is /incidentbot. If you change this, update your Slack app manifest to match.

root_slash_command: /incidentbot  # default

Jobs

Global scheduled maintenance jobs. All are enabled by default. Use this section to adjust intervals or disable individual jobs.

jobs:
  scrape_for_aging_incidents:
    enabled: true
    interval_days: 2      # how often to check (default: 2 days)
    max_age_days: 7       # incidents open longer than this trigger a digest alert
    ignore_statuses:      # statuses excluded from the aging check
      - monitoring

  update_slack_cache:
    enabled: true
    interval_minutes: 15  # how often to refresh Slack channel/user lists

  update_pagerduty_oc_data:
    enabled: true
    interval_minutes: 30  # how often to refresh PagerDuty on-call data
                          # only active when the PagerDuty integration is enabled

Reminders

Per-incident reminders are posted inside each incident channel on a repeating timer for as long as the incident is open. The system ships with two defaults — you can modify, remove, or extend them.

reminders:
  - id: comms_reminder
    message: "Time to send a status update?"
    interval_minutes: 30
    actions:
      - type: send_update
        label: "Send Update"
      - type: snooze
        intervals: [30, 60, 90]
      - type: dismiss
        label: "Stop Reminders"

  - id: role_watcher
    message: "No roles have been assigned yet  please review and claim as needed."
    interval_minutes: 10
    once: true
    conditions:
      no_roles_claimed: true
    include_role_buttons: true
    actions:
      - type: dismiss
        label: "Dismiss"

Reminder fields

Field Type Default Description
id string required Unique identifier; used in Slack action IDs.
message string required Text shown in the reminder post.
interval_minutes int required How often the reminder fires.
once bool false Fire at most once, then remove itself.
enabled bool true Set to false to disable without removing the entry.
include_role_buttons bool false Append a "Join as <role>" button for each configured role.
conditions object none All conditions must be true for the reminder to fire.
actions list [] Buttons shown on the reminder message.

Conditions

All conditions in a conditions block must be satisfied for the reminder to fire.

Condition Type Description
severity_is list of strings Only fire if the incident severity is in this list.
status_is list of strings Only fire if the incident status is in this list.
status_is_final bool Only fire when the incident is in a terminal status.
no_roles_claimed bool Only fire when no participant has claimed a role yet.

Action types

Type Fields Description
send_update label Opens the status update modal.
snooze intervals (list of minutes) Reschedules the reminder after a chosen delay.
dismiss label Permanently stops this reminder for the incident.

Automations

Automations fire once in response to incident lifecycle events, optionally filtered by conditions.

automations:
  # Invite the on-call team and page PagerDuty for all sev1 incidents
  - id: sev1_response
    trigger: on_open
    conditions:
      severity_is: [sev1]
    actions:
      - type: invite_group
        name: sre-oncall
      - type: page_pagerduty
        escalation_policy: PXXXXXX
        priority: high

  # Remind responders to update the status page for sev1/sev2
  - id: status_page_nudge
    trigger: on_open
    conditions:
      severity_is: [sev1, sev2]
    actions:
      - type: post_message
        message: "Remember to update the status page if customers may be affected."

  # Post a postmortem reminder when an incident resolves
  - id: postmortem_reminder
    trigger: on_final_status
    actions:
      - type: post_message
        message: "Incident resolved. Please open a postmortem within 48 hours."

Automation fields

Field Type Default Description
id string required Unique identifier.
trigger string required Lifecycle event that fires this automation (see below).
enabled bool true Set to false to disable without removing the entry.
conditions object none Same structure as reminder conditions.
actions list [] Actions to execute when the automation fires.

Triggers

Trigger Fires when
on_open An incident is first created.
on_severity_change The incident severity is updated.
on_status_change The incident status is updated.
on_final_status The incident reaches a terminal status.

Automation action types

Type Fields Description
invite_group name Invite a Slack user group into the incident channel.
page_pagerduty escalation_policy, priority Trigger a PagerDuty escalation. Requires the PagerDuty integration. priority: low (default) or high.
post_message message Post a plain-text message to the incident channel.