S3 Notifications as Webhooks for Automation

Hey there, my friends!
I've been working on automating various workflows lately, and let me tell you - I stumbled upon something that completely changed how I think about event-driven automation. Do you remember when I wrote about setting up automated deployments? Well, this time I want to share something even more powerful: using S3 notifications as webhooks for automation.
So here's the story. A few weeks ago, I was dealing with a client who needed to process uploaded files automatically. The traditional approach? Polling S3 buckets every few minutes. Not elegant, not efficient, and definitely not what I'd call a modern solution. That's when I realized - why not use S3 notifications as webhooks? And that is great, to be honest.
Why S3 Notifications Beat Traditional Polling
Let's be real here - polling is so 2015. You're essentially asking S3 "Are there any new files?" every few minutes like an Donkey in Shrek "Are we there yet?" during a long car ride. Not only does this waste resources, but it also introduces delays in your automation pipeline.
S3 notifications, on the other hand, work like a proper webhook system. When something happens in your bucket - a file gets uploaded, deleted, or modified - S3 immediately sends a notification to your configured endpoint. No polling, no delays, no wasted resources. Quite neat, isn't it?
The Architecture That Actually Works
Now let's jump into how this all fits together. The architecture is surprisingly simple:
- S3 Bucket - Your file storage (obviously)
- S3 Event Notifications - Triggers when files change
- Lambda Function - Processes the events and sends HTTP webhooks
- External Webhook Endpoint - Your N8N instance or any automation platform
- Your Automation Logic - Whatever you want to happen
The beauty of this approach? S3 does all the heavy lifting of monitoring file changes, Lambda transforms those events into proper HTTP webhooks, and you just react to events as they happen. That is right, my logic was quite complex at first, but once I wrapped my head around it, everything clicked.
Setting Up S3 Notifications with Terraform
Let's dive into it one by one. I'll show you exactly how I set this up using Terraform because, to be honest, clicking through the AWS console is not my favorite way to spend an afternoon.
Step 0: Create KMS resources
We would like to keep out data secure, it's not "Telegram-based" automation. We're going semi-enterpirse!
1# KMS Key for S3 encryption
2resource "aws_kms_key" "s3_encryption_key" {
3 description = "KMS key for S3 bucket encryption"
4 deletion_window_in_days = var.kms_key_deletion_window
5 enable_key_rotation = true
6
7 policy = jsonencode({
8 Version = "2012-10-17"
9 Statement = [
10 {
11 Sid = "Enable IAM User Permissions"
12 Effect = "Allow"
13 Principal = {
14 AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
15 }
16 Action = "kms:*"
17 Resource = "*"
18 },
19 {
20 Sid = "Allow S3 Service"
21 Effect = "Allow"
22 Principal = {
23 Service = "s3.amazonaws.com"
24 }
25 Action = [
26 "kms:Decrypt",
27 "kms:GenerateDataKey"
28 ]
29 Resource = "*"
30 }
31 ]
32 })
33
34 tags = {
35 Name = "S3-Upload-Bucket-Key"
36 }
37}
38
39# KMS Key Alias
40resource "aws_kms_alias" "s3_encryption_key_alias" {
41 name = "alias/s3-upload-bucket-key"
42 target_key_id = aws_kms_key.s3_encryption_key.key_id
43}
Step 1: Create the S3 Bucket
First, let's create our S3 bucket with proper configuration:
1# S3 Bucket
2resource "aws_s3_bucket" "upload_bucket" {
3 bucket = var.bucket_name
4
5 tags = {
6 Name = var.bucket_name
7 Environment = "production"
8 Purpose = "file-uploads"
9 }
10}
11
12# S3 Bucket versioning
13resource "aws_s3_bucket_versioning" "upload_bucket_versioning" {
14 bucket = aws_s3_bucket.upload_bucket.id
15 versioning_configuration {
16 status = "Enabled"
17 }
18}
19
20# S3 Bucket server-side encryption with customer managed KMS key
21resource "aws_s3_bucket_server_side_encryption_configuration" "upload_bucket_encryption" {
22 bucket = aws_s3_bucket.upload_bucket.id
23
24 rule {
25 apply_server_side_encryption_by_default {
26 kms_master_key_id = aws_kms_key.s3_encryption_key.arn
27 sse_algorithm = "aws:kms"
28 }
29 bucket_key_enabled = true
30 }
31}
Step 2: Secure the bucket again
As we're talking about security in our S3 series, we should apply this knowledge in practice.
1# S3 Bucket public access block
2resource "aws_s3_bucket_public_access_block" "upload_bucket_pab" {
3 bucket = aws_s3_bucket.upload_bucket.id
4
5 block_public_acls = true
6 block_public_policy = true
7 ignore_public_acls = true
8 restrict_public_buckets = true
9}
10
11resource "aws_s3_bucket" "access_logs_bucket" {
12 bucket = "${var.bucket_name}-access-logs"
13
14 tags = {
15 Name = "${var.bucket_name}-access-logs"
16 Environment = "production"
17 Purpose = "access-logs"
18 }
19}
20
21resource "aws_s3_bucket_server_side_encryption_configuration" "access_logs_encryption" {
22 bucket = aws_s3_bucket.access_logs_bucket.id
23
24 rule {
25 apply_server_side_encryption_by_default {
26 kms_master_key_id = aws_kms_key.s3_encryption_key.arn
27 sse_algorithm = "aws:kms"
28 }
29 bucket_key_enabled = true
30 }
31}
32
33# S3 Bucket logging
34resource "aws_s3_bucket" "access_logs_bucket" {
35 bucket = "${var.bucket_name}-access-logs"
36
37 tags = {
38 Name = "${var.bucket_name}-access-logs"
39 Environment = "production"
40 Purpose = "access-logs"
41 }
42}
43
44resource "aws_s3_bucket_logging" "bucket_access_log" {
45 bucket = aws_s3_bucket.upload_bucket.id
46
47 target_bucket = aws_s3_bucket.access_logs_bucket.id
48 target_prefix = "log/"
49}
Step 3: Create the Lambda Function
Now let's create the Lambda function that will process our webhook events:
1resource "aws_lambda_function" "s3_webhook_processor" {
2 filename = "s3_webhook_processor.zip"
3 function_name = "s3-webhook-processor"
4 role = aws_iam_role.lambda_execution_role.arn
5 handler = "index.handler"
6 runtime = "python3.9"
7 timeout = 30
8
9 environment {
10 variables = {
11 ENVIRONMENT = "production"
12 LOG_LEVEL = "INFO"
13 }
14 }
15
16 depends_on = [
17 aws_iam_role_policy_attachment.lambda_logs,
18 aws_cloudwatch_log_group.lambda_logs,
19 ]
20}
21
22resource "aws_lambda_permission" "allow_s3_invoke" {
23 statement_id = "AllowExecutionFromS3Bucket"
24 action = "lambda:InvokeFunction"
25 function_name = aws_lambda_function.s3_webhook_processor.function_name
26 principal = "s3.amazonaws.com"
27 source_arn = aws_s3_bucket.automation_bucket.arn
28}
Step 3: Set Up the S3 Event Notification
Here's where the magic happens. Let's configure S3 to send notifications to our Lambda function:
1
2# Lambda function code
3data "archive_file" "lambda_zip" {
4 type = "zip"
5 output_path = "lambda_function.zip"
6
7 source {
8 content = templatefile("${path.module}/lambda_function.py.tpl", {
9 webhook_endpoint = var.webhook_endpoint
10 })
11 filename = "lambda_function.py"
12 }
13}
14
15# CloudWatch Log Group for Lambda (with encryption)
16resource "aws_cloudwatch_log_group" "lambda_logs" {
17 name = "/aws/lambda/s3-webhook-sender"
18 retention_in_days = 14
19 kms_key_id = aws_kms_key.cloudwatch_logs_key.arn
20
21 tags = {
22 Name = "Lambda-S3-Webhook-Logs"
23 }
24}
25
26# Lambda function
27resource "aws_lambda_function" "webhook_sender" {
28 filename = data.archive_file.lambda_zip.output_path
29 function_name = "s3-webhook-sender"
30 role = aws_iam_role.lambda_execution_role.arn
31 handler = "lambda_function.lambda_handler"
32 runtime = "python3.9"
33 timeout = 60 # Increased timeout
34 memory_size = 256 # More memory
35
36 source_code_hash = data.archive_file.lambda_zip.output_base64sha256
37
38 # Enable X-Ray tracing
39 tracing_config {
40 mode = "Active"
41 }
42
43 environment {
44 variables = {
45 WEBHOOK_ENDPOINT = var.webhook_endpoint
46 }
47 }
48
49 depends_on = [
50 aws_cloudwatch_log_group.lambda_logs
51 ]
52
53 tags = {
54 Name = "S3WebhookSender"
55 }
56}
Step 4: IAM Roles and Policies
And of course, we need proper IAM permissions. That is due to AWS security architecture design, where we need roles for Bucket Access, Lambda execution, Putting logs into CloudWatch, etc. Here is just example code:
1# IAM Policy for S3 access
2resource "aws_iam_policy" "s3_upload_policy" {
3 name = "S3UploadOnlyPolicy"
4 description = "Policy that allows list and upload operations on specific S3 bucket"
5
6 policy = jsonencode({
7 Version = "2012-10-17"
8 Statement = [
9 {
10 Effect = "Allow"
11 Action = [
12 "s3:ListBucket",
13 "s3:GetBucketLocation"
14 ]
15 Resource = aws_s3_bucket.upload_bucket.arn
16 },
17 {
18 Effect = "Allow"
19 Action = [
20 "s3:PutObject",
21 "s3:GetObject"
22 ]
23 Resource = [
24 "${aws_s3_bucket.upload_bucket.arn}/*"
25 ]
26 },
27 {
28 Effect = "Allow"
29 Action = [
30 "kms:Decrypt",
31 "kms:GenerateDataKey"
32 ]
33 Resource = aws_kms_key.s3_encryption_key.arn
34 }
35 ]
36 })
37}
The Lambda Function Code
Now let's talk about the actual webhook processing logic. Here's a Python function that handles S3 events:
1import json
2import urllib3
3import os
4from datetime import datetime
5import boto3
6
7
8def lambda_handler(event, context):
9 """
10 Lambda function to send webhook notifications when files are uploaded to S3
11 """
12
13 # Initialize HTTP client
14 http = urllib3.PoolManager()
15
16 # Get webhook endpoint from environment variable
17 webhook_endpoint = os.environ.get('WEBHOOK_ENDPOINT',
18 '${webhook_endpoint}')
19 try:
20 for i, record in enumerate(event['Records']):
21 print(f"Processing record {i+1}/{len(event['Records'])}: {record['s3']['object']['key']}")
22 # Extract S3 event information
23 event_name = record['eventName']
24 bucket_name = record['s3']['bucket']['name']
25 object_key = record['s3']['object']['key']
26 object_size = record['s3']['object']['size']
27 event_time = record['eventTime']
28
29 # Get additional object metadata
30 s3_client = boto3.client('s3')
31 try:
32 response = s3_client.head_object(
33 Bucket=bucket_name,
34 Key=object_key)
35 content_type = response.get('ContentType', 'unknown')
36 last_modified = response.get('LastModified', '').isoformat() if response.get('LastModified') else ''
37 except Exception as e:
38 print(f"Error getting object metadata: {str(e)}")
39 content_type = 'unknown'
40 last_modified = ''
41
42 # Prepare webhook payload
43 webhook_payload = {
44 "event": "file_uploaded",
45 "event_name": event_name,
46 "bucket": bucket_name,
47 "key": object_key,
48 "size": object_size,
49 "content_type": content_type,
50 "timestamp": event_time,
51 "last_modified": last_modified,
52 "s3_url": f"s3://{bucket_name}/{object_key}",
53 "metadata": {
54 "source": "s3-lambda-webhook",
55 "processed_at": datetime.utcnow().isoformat(),
56 "aws_region": os.environ.get('AWS_REGION', 'unknown')
57 }
58 }
59
60 # Send webhook
61 print(f"Sending webhook for {bucket_name}/{object_key}")
62
63 response = http.request(
64 'POST',
65 webhook_endpoint,
66 body=json.dumps(webhook_payload),
67 headers={
68 'Content-Type': 'application/json',
69 'User-Agent': 'AWS-S3-Lambda-Webhook/1.0'
70 },
71 timeout=30
72 )
73
74 if response.status == 200:
75 print(f"Webhook sent successfully for {object_key}. Response: {response.status}")
76 else:
77 print(f"Webhook failed for {object_key}. Status: {response.status}, Response: {response.data}")
78
79 # Log the error but don't fail the entire function
80 error_payload = {
81 "error": "webhook_failed",
82 "status_code": response.status,
83 "object_key": object_key,
84 "bucket": bucket_name,
85 "response_body": response.data.decode('utf-8') if response.data else 'No response body'
86 }
87 print(f"Error details: {json.dumps(error_payload)}")
88
89 except Exception as e:
90 print(f"Error processing S3 event: {str(e)}")
91 print(f"Event data: {json.dumps(event)}")
92
93 # Return error but don't raise exception to avoid retries
94 return {
95 'statusCode': 500,
96 'body': json.dumps({
97 'error': str(e),
98 'event': event
99 })
100 }
101
102 return {
103 'statusCode': 200,
104 'body': json.dumps({
105 'message': 'Webhook notifications sent successfully',
106 'processed_records': len(event['Records'])
107 })
108 }
Real-World Use Cases That Actually Matter
So where does this approach shine? Let me share some scenarios where I've implemented this pattern:
Document Processing Pipeline: When users upload documents to S3, the webhook immediately triggers OCR processing, data extraction, and database updates. No more waiting for cron jobs!
Media File Automation: Video uploads trigger encoding workflows, thumbnail generation, and content distribution. The moment a file lands in S3, the entire pipeline kicks into gear.
Data Integration: CSV files dropped into S3 automatically trigger data validation, transformation, and loading into data warehouses. ETL processes start instantly instead of waiting for scheduled runs.
Backup and Archiving: When files are uploaded to certain prefixes, they're automatically replicated to different regions or archived to Glacier. Set it up once, forget about it forever.
Testing Your Webhook Setup
Now you can think: "This all sounds great, but how do I test it?" Good question! Here's how I validate that everything works:
1. CloudWatch Logs Monitoring
First, set up proper logging to track webhook events:
1resource "aws_cloudwatch_log_group" "lambda_logs" {
2 name = "/aws/lambda/s3-webhook-sender"
3 retention_in_days = 1
4 kms_key_id = aws_kms_key.cloudwatch_logs_key.arn
5
6 tags = {
7 Name = "Lambda-S3-Webhook-Logs"
8 }
9}
2. Simple Test Script
Create a simple test script to upload files and verify notifications:
1import boto3
2import time
3
4def test_s3_webhook():
5 s3_client = boto3.client('s3')
6 bucket_name = 'my-automation-webhook-bucket'
7
8 # Upload test file
9 test_content = "This is a test file for webhook validation"
10 s3_client.put_object(
11 Bucket=bucket_name,
12 Key='uploads/test.json',
13 Body=test_content,
14 ContentType='application/json'
15 )
16
17 print("Test file uploaded. Check CloudWatch logs for webhook processing.")
18
19 # Wait a bit, then delete
20 time.sleep(5)
21 s3_client.delete_object(Bucket=bucket_name, Key='uploads/test.json')
22 print("Test file deleted. Webhook should process deletion event.")
23
24if __name__ == "__main__":
25 test_s3_webhook()
Or you can drag and drop file in AWS console, for testing seems to be the fastes, and easiest method, as you can skip auth part in your code.
Performance and Cost Considerations
Let's be honest - this approach has some serious advantages over traditional polling:
Cost Efficiency: No more constant API calls to check for changes. You only pay for actual processing when events occur. In my experience, this can reduce costs by 70-80% compared to frequent polling.
Real-Time Processing: Events are processed within seconds of file changes, not minutes or hours later. This is crucial for time-sensitive automation workflows.
Scalability: S3 handles millions of events without breaking a sweat. Your webhook processing automatically scales with Lambda's concurrent execution limits.
Reliability: Built-in retry mechanisms and dead letter queues ensure events don't get lost. AWS handles the heavy lifting of reliable event delivery.
Common Pitfalls and How to Avoid Them
However, let me share some challenges I encountered and how to solve them:
1. Circular Event Loops
Be careful not to create infinite loops where your Lambda function modifies S3 objects that trigger more events. Use different prefixes for input and output files:
1resource "aws_s3_bucket_notification" "webhook_notification" {
2 bucket = aws_s3_bucket.automation_bucket.id
3
4 lambda_function {
5 lambda_function_arn = aws_lambda_function.s3_webhook_processor.arn
6 events = ["s3:ObjectCreated:*"]
7 filter_prefix = "input/" # Only process files in input folder
8 }
9}
2. Large File Processing
For large files, consider using S3 Transfer Acceleration or processing files in chunks to avoid Lambda timeout issues.
3. Event Ordering
S3 events aren't guaranteed to arrive in order. If sequence matters, implement your own ordering logic or use SNS FIFO.
Monitoring and Troubleshooting
That is why proper monitoring is crucial. Here's what I always set up:
1resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
2 alarm_name = "s3-webhook-lambda-errors"
3 comparison_operator = "GreaterThanThreshold"
4 evaluation_periods = "2"
5 metric_name = "Errors"
6 namespace = "AWS/Lambda"
7 period = "300"
8 statistic = "Sum"
9 threshold = "5"
10 alarm_description = "This metric monitors lambda errors"
11 alarm_actions = [aws_sns_topic.alerts.arn]
12
13 dimensions = {
14 FunctionName = aws_lambda_function.s3_webhook_processor.function_name
15 }
16}
What's Next?
Native n8n s3 use case is:
- Run time based trigger
- Get all files from bucket
- Filter them, based on upload timestamp
- Proceed them with loop
That is why, once you start using S3 notifications as webhooks, you'll wonder how you lived without them. The pattern is so simple that I've started applying it to various automation scenarios.
With this setup, I'm able to process files in real-time, trigger complex workflows instantly, and build truly event-driven architectures. The entire infrastructure deploys in about 60s with Terraform, and you get a robust, scalable webhook system that handles thousands of events per second. Where you can also generate cli-user, with s3:PutObject
, that will be able only upload object to your bucket.
And that is great, to be honest - you're not just polling anymore, you're reacting to events as they happen. Your automation becomes more responsive, more efficient, and more reliable.
The next time you find yourself polling S3 for file changes, remember this approach. Your future self will thank you for building a proper event-driven system instead of yet another polling mechanism.
If someone is interested in production-read code, ping me on Mastodon, I need to clean it up first :)
A Quick Note About AWS CDK
Before we wrap up, I have to mention something that's been bugging me lately. AWS recently announced that the CDK CLI will start collecting anonymous telemetry data, and here's the kicker - it's going to be opt-out, not opt-in. You can read more about this in GitHub issue #34892. This is exactly why I work with Terraform for infrastructure as code today. No surprises, no unexpected telemetry, just reliable infrastructure automation that does what you expect it to do. That is one of the reasons why all the examples in this post use Terraform instead of CDK.