Mastodon

S3 Notifications as Webhooks for Automation

Hey there, my friends!

I've been working on automating various workflows lately, and let me tell you - I stumbled upon something that completely changed how I think about event-driven automation. Do you remember when I wrote about setting up automated deployments? Well, this time I want to share something even more powerful: using S3 notifications as webhooks for automation.

So here's the story. A few weeks ago, I was dealing with a client who needed to process uploaded files automatically. The traditional approach? Polling S3 buckets every few minutes. Not elegant, not efficient, and definitely not what I'd call a modern solution. That's when I realized - why not use S3 notifications as webhooks? And that is great, to be honest.

Why S3 Notifications Beat Traditional Polling

Let's be real here - polling is so 2015. You're essentially asking S3 "Are there any new files?" every few minutes like an Donkey in Shrek "Are we there yet?" during a long car ride. Not only does this waste resources, but it also introduces delays in your automation pipeline.

S3 notifications, on the other hand, work like a proper webhook system. When something happens in your bucket - a file gets uploaded, deleted, or modified - S3 immediately sends a notification to your configured endpoint. No polling, no delays, no wasted resources. Quite neat, isn't it?

The Architecture That Actually Works

Now let's jump into how this all fits together. The architecture is surprisingly simple:

  • S3 Bucket - Your file storage (obviously)
  • S3 Event Notifications - Triggers when files change
  • Lambda Function - Processes the events and sends HTTP webhooks
  • External Webhook Endpoint - Your N8N instance or any automation platform
  • Your Automation Logic - Whatever you want to happen

The beauty of this approach? S3 does all the heavy lifting of monitoring file changes, Lambda transforms those events into proper HTTP webhooks, and you just react to events as they happen. That is right, my logic was quite complex at first, but once I wrapped my head around it, everything clicked.

Setting Up S3 Notifications with Terraform

Let's dive into it one by one. I'll show you exactly how I set this up using Terraform because, to be honest, clicking through the AWS console is not my favorite way to spend an afternoon.

Step 0: Create KMS resources

We would like to keep out data secure, it's not "Telegram-based" automation. We're going semi-enterpirse!

 1# KMS Key for S3 encryption
 2resource "aws_kms_key" "s3_encryption_key" {
 3  description             = "KMS key for S3 bucket encryption"
 4  deletion_window_in_days = var.kms_key_deletion_window
 5  enable_key_rotation     = true
 6
 7  policy = jsonencode({
 8    Version = "2012-10-17"
 9    Statement = [
10      {
11        Sid    = "Enable IAM User Permissions"
12        Effect = "Allow"
13        Principal = {
14          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
15        }
16        Action   = "kms:*"
17        Resource = "*"
18      },
19      {
20        Sid    = "Allow S3 Service"
21        Effect = "Allow"
22        Principal = {
23          Service = "s3.amazonaws.com"
24        }
25        Action = [
26          "kms:Decrypt",
27          "kms:GenerateDataKey"
28        ]
29        Resource = "*"
30      }
31    ]
32  })
33
34  tags = {
35    Name = "S3-Upload-Bucket-Key"
36  }
37}
38
39# KMS Key Alias
40resource "aws_kms_alias" "s3_encryption_key_alias" {
41  name          = "alias/s3-upload-bucket-key"
42  target_key_id = aws_kms_key.s3_encryption_key.key_id
43}

Step 1: Create the S3 Bucket

First, let's create our S3 bucket with proper configuration:

 1# S3 Bucket
 2resource "aws_s3_bucket" "upload_bucket" {
 3  bucket = var.bucket_name
 4
 5  tags = {
 6    Name        = var.bucket_name
 7    Environment = "production"
 8    Purpose     = "file-uploads"
 9  }
10}
11
12# S3 Bucket versioning
13resource "aws_s3_bucket_versioning" "upload_bucket_versioning" {
14  bucket = aws_s3_bucket.upload_bucket.id
15  versioning_configuration {
16    status = "Enabled"
17  }
18}
19
20# S3 Bucket server-side encryption with customer managed KMS key
21resource "aws_s3_bucket_server_side_encryption_configuration" "upload_bucket_encryption" {
22  bucket = aws_s3_bucket.upload_bucket.id
23
24  rule {
25    apply_server_side_encryption_by_default {
26      kms_master_key_id = aws_kms_key.s3_encryption_key.arn
27      sse_algorithm     = "aws:kms"
28    }
29    bucket_key_enabled = true
30  }
31}

Step 2: Secure the bucket again

As we're talking about security in our S3 series, we should apply this knowledge in practice.

 1# S3 Bucket public access block
 2resource "aws_s3_bucket_public_access_block" "upload_bucket_pab" {
 3  bucket = aws_s3_bucket.upload_bucket.id
 4
 5  block_public_acls       = true
 6  block_public_policy     = true
 7  ignore_public_acls      = true
 8  restrict_public_buckets = true
 9}
10
11resource "aws_s3_bucket" "access_logs_bucket" {
12  bucket = "${var.bucket_name}-access-logs"
13
14  tags = {
15    Name        = "${var.bucket_name}-access-logs"
16    Environment = "production"
17    Purpose     = "access-logs"
18  }
19}
20
21resource "aws_s3_bucket_server_side_encryption_configuration" "access_logs_encryption" {
22  bucket = aws_s3_bucket.access_logs_bucket.id
23
24  rule {
25    apply_server_side_encryption_by_default {
26      kms_master_key_id = aws_kms_key.s3_encryption_key.arn
27      sse_algorithm     = "aws:kms"
28    }
29    bucket_key_enabled = true
30  }
31}
32
33# S3 Bucket logging
34resource "aws_s3_bucket" "access_logs_bucket" {
35  bucket = "${var.bucket_name}-access-logs"
36
37  tags = {
38    Name        = "${var.bucket_name}-access-logs"
39    Environment = "production"
40    Purpose     = "access-logs"
41  }
42}
43
44resource "aws_s3_bucket_logging" "bucket_access_log" {
45  bucket = aws_s3_bucket.upload_bucket.id
46
47  target_bucket = aws_s3_bucket.access_logs_bucket.id
48  target_prefix = "log/"
49}

Step 3: Create the Lambda Function

Now let's create the Lambda function that will process our webhook events:

 1resource "aws_lambda_function" "s3_webhook_processor" {
 2  filename      = "s3_webhook_processor.zip"
 3  function_name = "s3-webhook-processor"
 4  role         = aws_iam_role.lambda_execution_role.arn
 5  handler      = "index.handler"
 6  runtime      = "python3.9"
 7  timeout      = 30
 8
 9  environment {
10    variables = {
11      ENVIRONMENT = "production"
12      LOG_LEVEL   = "INFO"
13    }
14  }
15
16  depends_on = [
17    aws_iam_role_policy_attachment.lambda_logs,
18    aws_cloudwatch_log_group.lambda_logs,
19  ]
20}
21
22resource "aws_lambda_permission" "allow_s3_invoke" {
23  statement_id  = "AllowExecutionFromS3Bucket"
24  action        = "lambda:InvokeFunction"
25  function_name = aws_lambda_function.s3_webhook_processor.function_name
26  principal     = "s3.amazonaws.com"
27  source_arn    = aws_s3_bucket.automation_bucket.arn
28}

Step 3: Set Up the S3 Event Notification

Here's where the magic happens. Let's configure S3 to send notifications to our Lambda function:

 1
 2# Lambda function code
 3data "archive_file" "lambda_zip" {
 4  type        = "zip"
 5  output_path = "lambda_function.zip"
 6
 7  source {
 8    content = templatefile("${path.module}/lambda_function.py.tpl", {
 9      webhook_endpoint = var.webhook_endpoint
10    })
11    filename = "lambda_function.py"
12  }
13}
14
15# CloudWatch Log Group for Lambda (with encryption)
16resource "aws_cloudwatch_log_group" "lambda_logs" {
17  name              = "/aws/lambda/s3-webhook-sender"
18  retention_in_days = 14
19  kms_key_id        = aws_kms_key.cloudwatch_logs_key.arn
20
21  tags = {
22    Name = "Lambda-S3-Webhook-Logs"
23  }
24}
25
26# Lambda function
27resource "aws_lambda_function" "webhook_sender" {
28  filename      = data.archive_file.lambda_zip.output_path
29  function_name = "s3-webhook-sender"
30  role          = aws_iam_role.lambda_execution_role.arn
31  handler       = "lambda_function.lambda_handler"
32  runtime       = "python3.9"
33  timeout       = 60  # Increased timeout
34  memory_size   = 256 # More memory
35
36  source_code_hash = data.archive_file.lambda_zip.output_base64sha256
37
38  # Enable X-Ray tracing
39  tracing_config {
40    mode = "Active"
41  }
42
43  environment {
44    variables = {
45      WEBHOOK_ENDPOINT = var.webhook_endpoint
46    }
47  }
48
49  depends_on = [
50    aws_cloudwatch_log_group.lambda_logs
51  ]
52
53  tags = {
54    Name = "S3WebhookSender"
55  }
56}

Step 4: IAM Roles and Policies

And of course, we need proper IAM permissions. That is due to AWS security architecture design, where we need roles for Bucket Access, Lambda execution, Putting logs into CloudWatch, etc. Here is just example code:

 1# IAM Policy for S3 access
 2resource "aws_iam_policy" "s3_upload_policy" {
 3  name        = "S3UploadOnlyPolicy"
 4  description = "Policy that allows list and upload operations on specific S3 bucket"
 5
 6  policy = jsonencode({
 7    Version = "2012-10-17"
 8    Statement = [
 9      {
10        Effect = "Allow"
11        Action = [
12          "s3:ListBucket",
13          "s3:GetBucketLocation"
14        ]
15        Resource = aws_s3_bucket.upload_bucket.arn
16      },
17      {
18        Effect = "Allow"
19        Action = [
20          "s3:PutObject",
21          "s3:GetObject"
22        ]
23        Resource = [
24          "${aws_s3_bucket.upload_bucket.arn}/*"
25        ]
26      },
27      {
28        Effect = "Allow"
29        Action = [
30          "kms:Decrypt",
31          "kms:GenerateDataKey"
32        ]
33        Resource = aws_kms_key.s3_encryption_key.arn
34      }
35    ]
36  })
37}

The Lambda Function Code

Now let's talk about the actual webhook processing logic. Here's a Python function that handles S3 events:

  1import json
  2import urllib3
  3import os
  4from datetime import datetime
  5import boto3
  6
  7
  8def lambda_handler(event, context):
  9    """
 10    Lambda function to send webhook notifications when files are uploaded to S3
 11    """
 12
 13    # Initialize HTTP client
 14    http = urllib3.PoolManager()
 15
 16    # Get webhook endpoint from environment variable
 17    webhook_endpoint = os.environ.get('WEBHOOK_ENDPOINT',
 18                                      '${webhook_endpoint}')
 19    try:
 20        for i, record in enumerate(event['Records']):
 21            print(f"Processing record {i+1}/{len(event['Records'])}: {record['s3']['object']['key']}")
 22            # Extract S3 event information
 23            event_name = record['eventName']
 24            bucket_name = record['s3']['bucket']['name']
 25            object_key = record['s3']['object']['key']
 26            object_size = record['s3']['object']['size']
 27            event_time = record['eventTime']
 28
 29            # Get additional object metadata
 30            s3_client = boto3.client('s3')
 31            try:
 32                response = s3_client.head_object(
 33                    Bucket=bucket_name,
 34                    Key=object_key)
 35                content_type = response.get('ContentType', 'unknown')
 36                last_modified = response.get('LastModified', '').isoformat() if response.get('LastModified') else ''
 37            except Exception as e:
 38                print(f"Error getting object metadata: {str(e)}")
 39                content_type = 'unknown'
 40                last_modified = ''
 41
 42            # Prepare webhook payload
 43            webhook_payload = {
 44                "event": "file_uploaded",
 45                "event_name": event_name,
 46                "bucket": bucket_name,
 47                "key": object_key,
 48                "size": object_size,
 49                "content_type": content_type,
 50                "timestamp": event_time,
 51                "last_modified": last_modified,
 52                "s3_url": f"s3://{bucket_name}/{object_key}",
 53                "metadata": {
 54                    "source": "s3-lambda-webhook",
 55                    "processed_at": datetime.utcnow().isoformat(),
 56                    "aws_region": os.environ.get('AWS_REGION', 'unknown')
 57                }
 58            }
 59
 60            # Send webhook
 61            print(f"Sending webhook for {bucket_name}/{object_key}")
 62
 63            response = http.request(
 64                'POST',
 65                webhook_endpoint,
 66                body=json.dumps(webhook_payload),
 67                headers={
 68                    'Content-Type': 'application/json',
 69                    'User-Agent': 'AWS-S3-Lambda-Webhook/1.0'
 70                },
 71                timeout=30
 72            )
 73
 74            if response.status == 200:
 75                print(f"Webhook sent successfully for {object_key}. Response: {response.status}")
 76            else:
 77                print(f"Webhook failed for {object_key}. Status: {response.status}, Response: {response.data}")
 78
 79                # Log the error but don't fail the entire function
 80                error_payload = {
 81                    "error": "webhook_failed",
 82                    "status_code": response.status,
 83                    "object_key": object_key,
 84                    "bucket": bucket_name,
 85                    "response_body": response.data.decode('utf-8') if response.data else 'No response body'
 86                }
 87                print(f"Error details: {json.dumps(error_payload)}")
 88
 89    except Exception as e:
 90        print(f"Error processing S3 event: {str(e)}")
 91        print(f"Event data: {json.dumps(event)}")
 92
 93        # Return error but don't raise exception to avoid retries
 94        return {
 95            'statusCode': 500,
 96            'body': json.dumps({
 97                'error': str(e),
 98                'event': event
 99            })
100        }
101
102    return {
103        'statusCode': 200,
104        'body': json.dumps({
105            'message': 'Webhook notifications sent successfully',
106            'processed_records': len(event['Records'])
107        })
108    }

Real-World Use Cases That Actually Matter

So where does this approach shine? Let me share some scenarios where I've implemented this pattern:

Document Processing Pipeline: When users upload documents to S3, the webhook immediately triggers OCR processing, data extraction, and database updates. No more waiting for cron jobs!

Media File Automation: Video uploads trigger encoding workflows, thumbnail generation, and content distribution. The moment a file lands in S3, the entire pipeline kicks into gear.

Data Integration: CSV files dropped into S3 automatically trigger data validation, transformation, and loading into data warehouses. ETL processes start instantly instead of waiting for scheduled runs.

Backup and Archiving: When files are uploaded to certain prefixes, they're automatically replicated to different regions or archived to Glacier. Set it up once, forget about it forever.

Testing Your Webhook Setup

Now you can think: "This all sounds great, but how do I test it?" Good question! Here's how I validate that everything works:

1. CloudWatch Logs Monitoring

First, set up proper logging to track webhook events:

1resource "aws_cloudwatch_log_group" "lambda_logs" {
2  name              = "/aws/lambda/s3-webhook-sender"
3  retention_in_days = 1
4  kms_key_id        = aws_kms_key.cloudwatch_logs_key.arn
5
6  tags = {
7    Name = "Lambda-S3-Webhook-Logs"
8  }
9}

2. Simple Test Script

Create a simple test script to upload files and verify notifications:

 1import boto3
 2import time
 3
 4def test_s3_webhook():
 5    s3_client = boto3.client('s3')
 6    bucket_name = 'my-automation-webhook-bucket'
 7    
 8    # Upload test file
 9    test_content = "This is a test file for webhook validation"
10    s3_client.put_object(
11        Bucket=bucket_name,
12        Key='uploads/test.json',
13        Body=test_content,
14        ContentType='application/json'
15    )
16    
17    print("Test file uploaded. Check CloudWatch logs for webhook processing.")
18    
19    # Wait a bit, then delete
20    time.sleep(5)
21    s3_client.delete_object(Bucket=bucket_name, Key='uploads/test.json')
22    print("Test file deleted. Webhook should process deletion event.")
23
24if __name__ == "__main__":
25    test_s3_webhook()

Or you can drag and drop file in AWS console, for testing seems to be the fastes, and easiest method, as you can skip auth part in your code.

Performance and Cost Considerations

Let's be honest - this approach has some serious advantages over traditional polling:

Cost Efficiency: No more constant API calls to check for changes. You only pay for actual processing when events occur. In my experience, this can reduce costs by 70-80% compared to frequent polling.

Real-Time Processing: Events are processed within seconds of file changes, not minutes or hours later. This is crucial for time-sensitive automation workflows.

Scalability: S3 handles millions of events without breaking a sweat. Your webhook processing automatically scales with Lambda's concurrent execution limits.

Reliability: Built-in retry mechanisms and dead letter queues ensure events don't get lost. AWS handles the heavy lifting of reliable event delivery.

Common Pitfalls and How to Avoid Them

However, let me share some challenges I encountered and how to solve them:

1. Circular Event Loops

Be careful not to create infinite loops where your Lambda function modifies S3 objects that trigger more events. Use different prefixes for input and output files:

1resource "aws_s3_bucket_notification" "webhook_notification" {
2  bucket = aws_s3_bucket.automation_bucket.id
3
4  lambda_function {
5    lambda_function_arn = aws_lambda_function.s3_webhook_processor.arn
6    events              = ["s3:ObjectCreated:*"]
7    filter_prefix       = "input/"  # Only process files in input folder
8  }
9}

2. Large File Processing

For large files, consider using S3 Transfer Acceleration or processing files in chunks to avoid Lambda timeout issues.

3. Event Ordering

S3 events aren't guaranteed to arrive in order. If sequence matters, implement your own ordering logic or use SNS FIFO.

Monitoring and Troubleshooting

That is why proper monitoring is crucial. Here's what I always set up:

 1resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
 2  alarm_name          = "s3-webhook-lambda-errors"
 3  comparison_operator = "GreaterThanThreshold"
 4  evaluation_periods  = "2"
 5  metric_name         = "Errors"
 6  namespace           = "AWS/Lambda"
 7  period              = "300"
 8  statistic           = "Sum"
 9  threshold           = "5"
10  alarm_description   = "This metric monitors lambda errors"
11  alarm_actions       = [aws_sns_topic.alerts.arn]
12
13  dimensions = {
14    FunctionName = aws_lambda_function.s3_webhook_processor.function_name
15  }
16}

What's Next?

Native n8n s3 use case is:

  1. Run time based trigger
  2. Get all files from bucket
  3. Filter them, based on upload timestamp
  4. Proceed them with loop

That is why, once you start using S3 notifications as webhooks, you'll wonder how you lived without them. The pattern is so simple that I've started applying it to various automation scenarios.

With this setup, I'm able to process files in real-time, trigger complex workflows instantly, and build truly event-driven architectures. The entire infrastructure deploys in about 60s with Terraform, and you get a robust, scalable webhook system that handles thousands of events per second. Where you can also generate cli-user, with s3:PutObject, that will be able only upload object to your bucket.

And that is great, to be honest - you're not just polling anymore, you're reacting to events as they happen. Your automation becomes more responsive, more efficient, and more reliable.

The next time you find yourself polling S3 for file changes, remember this approach. Your future self will thank you for building a proper event-driven system instead of yet another polling mechanism.

If someone is interested in production-read code, ping me on Mastodon, I need to clean it up first :)

A Quick Note About AWS CDK

Before we wrap up, I have to mention something that's been bugging me lately. AWS recently announced that the CDK CLI will start collecting anonymous telemetry data, and here's the kicker - it's going to be opt-out, not opt-in. You can read more about this in GitHub issue #34892. This is exactly why I work with Terraform for infrastructure as code today. No surprises, no unexpected telemetry, just reliable infrastructure automation that does what you expect it to do. That is one of the reasons why all the examples in this post use Terraform instead of CDK.