Cyberincidents (like cyber crime, IT failure/outage, and data breaches) are the number one threat of large and small companies. Now imagine your company is hit by a data breach. One moment you're minding your business as usual, the next you notice unusually high outbound traffic, some changes in important files, suspicious user activity, or you're straight-up locked out of your accounts. How do you react?

If simply thinking about this scenario triggered trepidation, you're unprepared for such a situation. But don't panic.

In this post, we'll explain what incidents are, why it's crucial to have an incident response playbook in place, how to create one and automate it with our workflow template.

Table of contents

What is incident response?
Why do you need an incident response playbook?
What are the phases of an incident response plan?
Why should you automate the incident response?
Build an automated incident response workflow
      Workflow prerequisites
      Workflow 1 - Inform the team about the incident
      Workflow 2 - Ensure that the incident is acknowledged internally
      Workflow 3 - Mark the incident as resolved
      Activate the workflows in production
Start automating!

What is incident response?

Let's start with a definition of the key terms you'll encounter in this article: incident, incident response, and incident response plan.

An incident is an event that threatens the operations, services, or functionality of an organization. For example, a server outage or cybersecurity attacks disrupt the everyday operations of an organization.

An incident response is the reaction to an incident. This reaction represents the way the organization manages the situation: from identifying the issue, over analyzing and fixing the problem, to preventing future re-occurrences.

An incident response plan (or playbook) is an organized, pre-established sequence of measures that the IT team takes when faced with an incident.

Why do you need an incident response playbook?

It is a truth universally acknowledged that people don't make good decisions under stress. The stakes are even higher at large scale, when the data of millions of users or critical business operations are impacted.

And yet, more than 77% of organizations do not have an incident response plan. A Kaspersky report revealed that 51% of companies detected an incident after impact, and 74% needed weeks up to months for remediation.

An incident response playbook allows you to jump from (over-)thinking to acting in moments of crisis. The less decisions you need to make on the spot, the faster you can respond to the incident, minimize damage, and protect your company's reputation.

What are the phases of an incident response plan?

An efficient incident response typically consists of six phases:

  1. Preparation: Get the IT team together, assign clear roles to each of them of what they should do in the case of an incident, and ensure that all necessary tools or services are available.
  2. Identification: This is the "Houston, we have a problem" moment. Once an incident is identified, establish what happened, where it originated, what parts of the business it affects and to what extent.
  3. Containment: Limit the incident threat from spreading to affect other processes, systems, or operations, and establish backups.
  4. Eradication: Make sure the issue is fixed and systems are patched, so thoroughly that no trace of the incident is left behind.
  5. Recovery: Once the incident is resolved, get all the services up and running again, restore connections and services to production.
  6. Learnings: Get the IT team together to discuss what they learned from managing the threat, what went well, and––most importantly perhaps––what could be improved. Add the incident and the taken measures to your internal documentation, so they can serve as reference for future issues.

These six phases should also be reflected in your incident response playbook.

Why should you automate the incident response?

Having an incident response playbook is a first step towards managing incidents more efficiently. However, it still involves manual actions (for example, notifying all team members in due time or creating tickets). The solution to this problem is automated incident response.

Though you can’t automate the whole process (someone has to roll up their sleeves and fix the actual issue), you can automate some low-level tasks that add up to cause delays and errors. Automation can assist, supplement, or completely replace human intervention. As a result, the IT team has less trivial tasks to worry about, and instead more time to deal with the critical ones.

By automating pretty much every step of the playbook, you achieve more efficient communication between team members and faster response times, which in turn improve the main incident response metrics: mean time to detect (MTTD),mean time to acknowledge (MTTA), and mean time to resolution / respond (MTTR).

Build an automated incident response workflow

Let's put the theory into practice. In this part, we'll build an automated workflow that follows the following incident response protocol:

  1. Triage issue in the project management platform
  2. Create a special channel in the team communication platform
  3. Add the on-call team members to the new channel
  4. Acknowledge the issue
  5. Fix the issue
  6. Resolve the issue ticket

In our workflow, we'll use the following services:

  • Jira for project management and issue tickets. (Alternatively, you can use Trello)
  • Mattermost for team communication. (Alternatively, you can use Slack)
  • PagerDuty for managing incidents.

This workflow consists of three parts, each tackling different playbook steps:

  • Workflow 1 covers triaging the issue, creating a special communication channel, and tagging the on-call team (steps 1-3).
  • Workflow 2 covers acknowledging the issue (step 4).
  • workflow 3 covers fixing the issue and resolving the ticket automatically (steps 5-6).

All put together, the final workflow look like this:

Screenshot of n8n Editor UI showing three workflows for automated incident response
Workflows for automated incident response

Workflow prerequisites

To follow along this tutorial and implement the workflow yourself, you'll need the following:

Workflow 1 - Inform the team about the incident

The first workflow automates the ChatOps practice when a new incident is detected.

Screenshot of n8n Editor UI showing the first workflow
Workflow 1
  • Webhook node triggers the workflow when an incident is created in PagerDuty.
  • Mattermost node creates a new channel for the specific incident.
  • Mattermost1 node adds responsible users to the channel.
  • Jira node creates an issue about the incident in Jira.
  • Mattermost2 node posts a message in the channel with links to the PagerDuty incident and Jira issue.
  • Mattermost3 node posts a message in the channel with the options (buttons) Acknowledge and Resolve.

Here's how to configure the parameters of each node:

1. Webhook node

  • Authentication: None
  • HTTP Method: POST
  • Path: 9888d896-dd23-4e97-9d16-c12055b64133
    By default, this field contains a randomly generated webhook URL path, to avoid conflicts with other webhook nodes. You can also manually specify a URL path if necessary. Learn more in the documentation of the Webhook node.
  • Respond: Immediately
  • Response Code: 200

2. Mattermost1 node: Create an auxiliary channel

  • Resource: Channel
  • Operation: Create
  • Team ID: The ID of your team
    Read the documentation of the Mattermost node to learn how to get your Team ID or fix issues related to it.
  • Display Name: {{$json["body"]["event"]["data"]["title"]}}
  • Name: {{$json["body"]["event"]["data"]["incident_key"]}}
  • Type: Public

3. Mattermost2 node: Add on-call team to auxiliary channel

  • Resource: Channel
  • Operation: Add User
  • Channel ID: {{$json["id"]}}
  • User ID: The ID of the responsible user

4. Jira1 node: Triage the issue in Jira

  • Jira Version: Cloud
  • Resource: Issue
  • Operation: Create
  • Project: Your project number
  • Issue Type: Your issue number
  • Summary: {{$node["Webhook"].json["body"]["event"]["data"]["title"]}}
  • Additional Fields > Assignee: The ID of the assigned member

5. Mattermost3 node: Post details in the Incidents channel

  • Resource: Message
  • Operation: Post
  • Channel ID: The ID of the Mattermost channel
  • Message: 🚨 New incident: Auxiliary Channel -> https://mattermost.internal.n8n.io/test/channels/{{$node["Create Channel"].json["name"]}} PagerDuty Incident -> {{$node["Webhook"].json["body"]["event"]["data"]["html_url"]}} Jira Issue -> https://n8n.atlassian.net/browse/{{$json["key"]}}

6. Mattermost4 node: Post details and action buttons in the auxiliary channel

  • Resource: Message
  • Operation: Post
  • Channel ID: {{$node["Create Channel"].json["id"]}}
  • Message: ⚠️ {{$node["Webhook"].json["body"]["messages"][0]["log_entries"][0]["incident"]["summary"]}} PagerDuty incident: {{$node["Webhook"].json["body"]["messages"][0]["log_entries"][0]["incident"]["html_url"]}} Jira issue: https://n8n.atlassian.net/browse/{{$json["key"]}}
  • Attachments > Actions:
    • Type: Button
    • Name: Acknowledge
    • Integration:
      • URL: https://[URL of your integration]/webhook/ack
      • Context:
        • Property Name: pagerduty_incident
          Property Value: {{ $node["Webhook"].json["body"]["event"]["data"]["id"] }}
    • Type: Button
    • Name: Resolve
    • Integration:
      • URL: https://[URL of your integration]/webhook/resolve
      • Context:
        • Property Name: jira_key
          Property Value: {{$json["key"]}}
        • Property Name: pagerduty_incident
          Property Value: {{ $node["Webhook"].json["body"]["event"]["data"]["id"] }}

Workflow 2 - Ensure that the incident is acknowledged internally

The second workflow automates the acknowledgement of the incident by the on-call team member.

Screenshot of n8n Editor UI showing the second workflow
Workflow 2
  • Webhook node triggers the workflow when the button Acknowledge is clicked in the Mattermost channel.
  • PagerDuty node updates the incident status as "Acknowledged".
  • Mattermost4 node posts a message in the channel that the incident has been acknowledged.

Here's how to configure the parameters of each node:

1. Webhook (Ack) node: Get data from the Acknowledge button

  • Authentication: None
  • HTTP Method: POST
  • Path: /ack
  • Respond: Immediately
  • Response Code: 200

2. PagerDuty1 node: Acknowledge the incident on PagerDuty

  • Resource: Incident
  • Operation: Update
  • Incident ID: {{$json["body"]["context"]["pagerduty_incident"]}}
  • Email: Your email address
  • Update Fields > Status: Acknowledged

3. Mattermost5 node: Confirm the acknowledgment

  • Resource: Message
  • Operation: Post
  • Channel ID: {{$node["Ack"].json["body"]["channel_id"]}}
  • Message: 💪🏼 Incident status has been changed to Acknowledged on PagerDuty.

Workflow 3 - Mark the incident as resolved

The third workflow automates the resolution of the issue.

Screenshot of n8n Editor UI showing the third workflow
Workflow 3
  • Webhook node triggers the workflow when the button Resolve is clicked in the Mattermost channel.
  • PagerDuty1 node updates the incident status as "Resolved".
  • Jira1 node
  • Mattermost5 node posts a message in the channel that the issue has been closed in PagerDuty and Jira.
  • Mattermost6 node posts a message in the channel that the incident has been resolved.

Here's how to configure the parameters of each node:

1. Webhook (Resolve) node: Get details from the Resolve button

  • Authentication: None
  • HTTP Method: POST
  • Path: /resolve
  • Respond: Immediately
  • Response Code: 200

2. PagerDuty2 node: Resolve the incident on PagerDuty

  • Resource: Incident
  • Operation: Update
  • Incident ID: {{$json["body"]["context"]["pagerduty_incident"]}}
  • Email: Your email address
  • Update Fields > Status: Resolved

3. Jira2 node: Resolve the incident on Jira

  • Jira Version: Cloud
  • Resource: Issue
  • Operation: Update
  • Issue Key: {{$node["Resolve"].json["body"]["context"]["jira_key"]}}
  • Update Fields > Status ID: 31

4. Mattermost6 and Mattermost7 nodes: Announce the resolution in the auxiliary and Incidents channel

The configuration is similar for both nodes.

  • Resource: Message
  • Operation: Post
  • Channel ID: {{$node["Resolve"].json["body"]["channel_id"]}}
  • Message: 💪 This issue got closed in PagerDuty and Jira.

Activate the workflows in production

We're done with building the workflows! Here's how to see them all in action, from start to end:

  1. In the n8n Editor UI, click the Execute Workflow button.
  2. Go to your PagerDuty account and create a test incident.
  3. Back in the n8n Editor UI, you'll see information being passed through the Webhook nodes, and the nodes of the workflows being executed.

With this configuration, the workflows work only when you manually execute them. To make the workflows run automatically, every time an incident is created in PagerDuty, you need to use the Production webhook and activate the workflows. Here's how to do this:

  1. Get the Production webhook URL from the different Webhook nodes,
  2. Update the URLs on PagerDuty and the Mattermost node from Workflow 1, Step 6,
  3. Save the workflows
  4. Activate the workflows.

That's it: you now have three no-code workflows that automate every step of an incident response, so the on-call team can focus on solving the problem.

Start automating!

In this post, you've learned why it's crucial to have an incident response playbook and how to take it a step further by creating an automated incident response workflow. We hope that this information will help your team mitigate risks and manage incidents more efficiently and confidently.

The best part is, you can start automating for free with n8n.

If you have other ideas for automated incident response, or run into trouble setting up the workflows, feel free to write in our community forum.