One of the most vexing issues in my cloud journey has been understanding how CloudTrail and CloudWatch Events work together. For some reason it took me years (and a lot of testing) to wrap my head around how the connection really works; and especially how it works with the concept of multi-regions and AWS Organization trails. Then, once I figured it all out, I assumed everyone already knew, but recent conversations have made clear this confusion is pretty common. So here is my best attempt to simplify things.
First, the problem we are trying to solve: build CloudWatch Rules based on CloudTrail Events and use them to send notifications or trigger Lambda functions.
That’s it — I want to send a notification for something simple like the API call to open up a new security group rule (AuthorizeSecurityGroupIngress, in case you were wondering).
To make this work you need the following in place:
- CloudTrail enabled in the region where the API call is made.
- CloudTrail streaming to CloudWatch.
- A CloudWatch Rule in the region of the API call which looks for that specific API call, or all CloudTrail API calls.
Now for the confusion:
- When you create either a multi-region CloudTrail or an Organization trail, behind the scenes AWS is actually setting up trails in every single region (and every account, in the case of an Org trail). They are all separate trails, but each is configured to send its results to a shared S3 bucket, and you can only manage each one in its home account and region.
- However CloudWatch events for API calls are only created in the region of the API call.
- So if you create a multi-region trail the data is all collected centrally, but the events only appear locally. A CloudWatch Rule in the region of the home trail will only trigger for API calls made in that (home) region. So if you build an alarm for security group changes, it will only work in the home region — not in the other regions — even though CloudTrail is turned on.
- The CloudWatch Log Group/Stream will appear in the primary region, not other regions, but each Event are created in the region which triggered the event.
- If you want to collect all events for API calls you need to use an undocumented event definition (which I have pasted below).
- If you read Amazon’s documentation… they never spell any of this out clearly. At least not that I have been able to find. In fact, I was once on a support call where I figured it out, and the AWS rep kept mumbling, “I don’t think that’s how it works” as my events started streaming in. He was having a rough night.
This was really non-intuitive to me for some reason. I had assumed that if you centralized the trail then you could centralize the CloudWatch Rule to trigger off API call events. Unfortunately that was totally incorrect, and even when you centralize the trail, you still need to create the Rule in every region you care about. Even if you use Event Bus to collect events from multiple accounts, you still need to create a CloudWatch Rule in every region of every account to send the event onto the Bus, and then you need to build Rules in every region of the Event Bus to trigger whatever notification/action you want.
Here is how I recommend approaching this if you want the near-real-time alerting capabilities or auto-remediation/actions supported by CloudWatch Rules:
- Turn on a multi-region trail. You only need to do this once, and an Organization trail is sufficient.
- This creates all the regional trails you need. It looks like one central trail, but is really a collection of regional trails sending their data to a central receiver.
- Option 1: Create a CloudWatch Rule in every region you want near-real-time alerting for. CloudFormation and Terraform are your friends here.
- Option 2: Centralize all your events. Within each region create a Rule to send all CloudTrail events to a Lambda function or SNS topic, which then forwards them to your destination. We use this technique ourselves; we send using a custom API endpoint, but you can stream to Kinesis or nearly anything.
To kickstart your journey here are two code samples.
The secret filter pattern for your CloudWatch Rule to collect all events from CloudTrail:
And here is sample Lambda code to forward events, in this case, to Kinesis: