Send event to AWS Lambda when a file is added to an S3 bucket

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (49/100)

In this blog post, I will show you how to configure S3 bucket notification, AWS Lambda permission, and the Lambda trigger to receive a notification when a file is added to an S3 bucket. Handling the event in the Lambda function is out of the scope of this article. I will use Terraform to configure the notifications and permissions.

The first thing we have to do is configuring the bucket notifications. Note that it is impossible to define notifications that have overlapping filter prefixes when the filter suffix is the same!

1
2
3
4
5
6
7
8
9
10
resource "aws_s3_bucket_notification" "bucket-events" {
  bucket = "bucket_name"

  queue {
    events = ["s3:ObjectCreated:*"]
    queue_arn = aws_sqs_queue.queue_name.arn
    filter_prefix = "file_key_prefix"
    filter_suffix = "file_key_suffix"
  }
} 

After that, we have to give the bucket_name bucket permission to send events to the queue, and the Lambda function needs permission to read the events:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
resource "aws_sqs_queue_policy" "bucket-events-policy" {
  queue_url = aws_sqs_queue.queue_name.id
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Id": "${aws_sqs_queue.queue_name.arn}",
  "Statement": [
    {
      "Sid": "First",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "SQS:SendMessage",
      "Resource": "${aws_sqs_queue.queue_name.arn}",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:s3:::bucket_name"
        }
      }
    },
    {
      "Sid": "First",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "SQS:ReceiveMessage",
      "Resource": "${aws_sqs_queue.queue_name.arn}",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn of the lambda function"
        }
      }
    }
  ]
}
EOF
}

In the end, we have to add the SQS ARN as the source of the events in the Serverless configuration of the Lambda function:

1
2
3
# Put this in the function part in the Serverless configuration
events:
  - sqs: 'SQS ARN'

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.