How to send AWS CloudWatch Alerts to a Slack channel using Terraform

In this article, I am going to show how to use Terraform to configure a CloudWatch alert and send the message to a Slack channel.

Slack Webhook in AWS Secrets Manager

Before I start configuring the alert, I need the URL of a webhook that sends messages to Slack. Those of you who are not familiar with Slack webhooks can get the address using the instructions from Slack manual: api.slack.com/messaging/webhooks

I don’t want to store the webhook URL in the repository that contains Terraform rules. After all, it allows sending any message to the channel, so I should secure it.

Let’s assume that I have stored the webhook URL in AWS Secrets Manager, and I can retrieve it using AWS CLI. Note that I use jq to extract the SecretString from the JSON response.

aws secretsmanager get-secret-value --secret-id cloudwatch-slack-webhook | jq .SecretString

I have to define a new variable in the variables.tf file:

variable "cloudwatch-slack-webhook" {}

Now, before calling the terraform plan command, I have to retrieve the URL from Secrets Manager and store it in the tfvars file. That is going to become tedious very soon, so I will define a Makefile script to do it for me:

TERRAFORM_VARS_FILE=terraform.tfvars

plan:
    echo "cloudwatch-slack-webhook = `aws secretsmanager get-secret-value --secret-id cloudwatch-slack-webhook | jq .SecretString`" > ${TERRAFORM_VARS_FILE}
    terraform plan -var-file=${TERRAFORM_VARS_FILE} -out=./terraform_plan

I should also use Makefile to apply the plan because otherwise, I would need to remember about passing the plan file as a parameter:

apply:
    terraform apply ./terraform_plan

Notify-slack module

In the next step, I must create a new tf file in the Terraform configuration directory and define a new module:

module "notify_slack" {
  source  = "terraform-aws-modules/notify-slack/aws"
  version = "~> 3.0"

  sns_topic_name = "send-to-slack"

  slack_webhook_url = "${var.cloudwatch-slack-webhook}"
  slack_channel     = "the-team-slack-channel"
  slack_username    = "CloudWatch"
}

The terraform-aws-notify-slack module creates an SNS topic that receives a notification every time the status of an alert changes. It also creates a Lambda function to read the notifications from SNS and propagate the messages to the Slack channel.

CloudWatch Alert

Now, I have to define the CloudWatch Alert. In this example, let’s imagine that I want to get a notification when an SQS consumer is not processing messages fast enough (or does not process them at all).

I am going to use the SQS ApproximateAgeOfOldestMessage metric to raise an alert when the oldest message is waiting in the queue for more than one minute.

First, let’s specify the part of the configuration that defines the alert name and description:

resource "aws_cloudwatch_metric_alarm" "sqs_too_old_messages" {
	 alarm_name = "sqs_too_old_messages"
	 alarm_description = "SQS messages older than one minute"
}

Now, I can designate the AWS resource that triggers this alert:

namespace = "AWS/SQS"
dimensions {
    QueueName = "the-name-of-the-queue"
}

To configure the metric, I need to define the metric type, the threshold, the size of the window on which I want to calculate the metric, and the number of time windows that must exceed the limit to raise the alarm.

In my example, I want to receive a notification whenever a message older than one minute is observed within a single 5-minute window.

metric_name = "ApproximateAgeOfOldestMessage"
statistic = "Maximum"
evaluation_periods = "1"
period = "300"
comparison_operator = "GreaterThanThreshold"
threshold = "60"
datapoints_to_alarm = "1"

In the end, I must use the previously defined Slack module to send notifications:

alarm_actions = ["${module.notify_slack.this_slack_topic_arn}"]
ok_actions = ["${module.notify_slack.this_slack_topic_arn}"]
insufficient_data_actions = ["${module.notify_slack.this_slack_topic_arn}"]

When I apply the Terraform plan and create the alert in AWS, I will immediately receive a notification telling me that there is not enough data to calculate the metric. It is the expected behavior of a newly defined CloudWatch Alert. After the monitored SQS queue updates its metrics, the status of the alert should change to either “OK” or “Alarm.”

Older post

Check-Engine - data quality validation for PySpark 3.0.0

A PySpark library for data quality checks and data validation.

Newer post

How to use one SparkSession to run all Pytest tests

How to speed us Pytest tests by reusing the same SparkSession in all of them