One of the most difficult problems in cloud security is building comprehensive multi-account/multi-cloud security monitoring and alerting. I’d say maybe 1 out of 10 organizations I assess or work with have something effective in place when I first show up. That’s why I added a major monitoring lab based on AirBnB’s StreamAlert project to the Securosis Advanced Cloud Security and Applied DevSecOps training class (we still have some spots available for our Black Hat 2019 class).
Odds are if you are reading this post you already know the challenges you face. Collecting raw logs is only part of the problem; effective cloud security involves collecting data from multiple sources even in a single provider and using them to drive alerts and responses rapidly enough to manage automated cloud attacks. What, you think attackers don’t use their own automation tools? Complicating this is that even within a single provider you often need to support multiple data and event collection mechanisms, each with different data flow paths that can mean the difference between seeing an alert within 1 minute or 20 minutes after the event.
I first came across StreamAlert in discussions and through this blog post by Scott Piper, the author of CloudMapper. StreamAlert is an open source project from AirBnB for cloud-scale log and event ingestion. It’s fully serverless and capable of taking in terabytes an hour. It’s becoming one of the more popular options among cloud experts. It runs in AWS but can collect information from anywhere. It’s also one of the easier OSS tools I’ve used, but some parts did take me a bit to understand and get up and running consistently.
This post pulls from the lab and existing documentation to help you get up and running with StreamAlert even more quickly than their getting started guide.
StreamAlert is a fully serverless deployment that runs in AWS. Your first step is to pick the security account where you want it up and running. I recommend a dedicated account, or at least one only used for other security tooling. Then, pick a region as your main home. You can deploy anywhere, and for even larger scaling use multiple concurrent deployments, but for this starter guide I’m assuming one region in one account.
StreamAlert is deployed using Hashicorp’s Terraform, but not the latest version. Follow these steps to get it working as quickly as possible:
- I recommend installing in a dedicated instance or container that is shared by the security team. While you can deploy on a local laptop this is not recommended in an enterprise environment.
- You need Terraform 11.x (12 no worky). Download any 11.x version appropriate to your operating system from here.
- It’s a single executable file and you can move it to anywhere in your path. Here are commands if you run this in an Amazon Linux 2 instance:
- Now set your environment, download and install StreamAlert (I pulled all this from the official getting started guide, and it’s all for Linux). Just copying and pasting through these commands will work for most of you:
- I prefer to run from inside an AWS instance or container with an assigned IAM role. If, instead, you run someplace else you will need an IAM user or role with some pretty extensive privileges for the account where you will install. In training I have it set for full admin. This command line will let you configure those credentials:
- These next steps is where you need to turn your brain back on. To start, you will need your account ID where you are installing StreamAlert and you need to pick a name to prefix all the components with. “StreamAlert” is easy enough (we use “cloudsed” in training). This next command line modifies a bunch of the Terraform files so I recommend using it rather than trying to edit them yourself:
- All our command lines for the rest of this assume you are running StreamAlert in the home directory of a Linux system. You’ll need to adjust the paths if you run them someplace else.
At this point I modify the global.json configuration file for the region I intend to use (this command line assumes you downloaded and unzipped everything in your home directory):
(Change the region)
Now you build the initial infrastructure. This command will use Terraform and the configuration files stored locally to build everything out. This includes lambda functions, DynamoDB tables, an S3 bucket, CloudWatch rules, some SNS/SQS, and a bit more. Being serverless you won’t see any containers or instances. Type this next command and then, when prompted, type “yes” a few times. Don’t worry too much if you see some error messages related to software versions unless the install totally fails:
Wait a few minutes and everything is up and running. But now it’s time to better understand how it works and what to configure.
Configuring StreamAlert for Production
At this point all the initial serverless components are deployed, but now you need to configure it. This is the part where I will deviate a bit more from the getting started guide. Without going into the full architecture, here are the basics to understand. And keep in mind, you don’t need to configure all of these manually, it’s all part of StreamAlert and you will focus on configuring the sources that send in the data:
- StreamAlert supports multiple clusters. We’ll just set up one for this guide, but you might consider more than one if you want different handling for different event sources (e.g. dev vs. prod) or need to spread things out a bit more if you are running at massive scale. A cluster is merely a coordinated collection of the StreamAlert services.
- You send logs to StreamAlert using 3 different mechanisms- S3 buckets, SNS messages, or Kinesis Streams. You’ll either save data to an S3 bucket, send an SNS message (which goes to a Lambda function), or put data into a Kinesis stream. Here are the tradeoffs:
- S3 is the slowest path since you need to aggregate data and save it to S3 where it triggers a Lambda that extracts and analyzes the data. This can take 5-20 minutes.
- SNS is very fast and works best across regions. You send data to an SNS topic (located in the source account, not the StreamAlert account) and send it to a pre-configured StreamAlert Lambda function. This gets around some annoying region and permissions limits (since you can use your Organization ID, if you use AWS Organizations). We will show this option in more detail in this post. Costs aren’t bad for moderate scale, but for bigger and more active accounts you will want to use Kinesis.
- Kinesis Data Streams are the most cost effective for large data volumes, but are more expensive at lower volumes since you are billed even when they aren’t accepting data. Kinesis is also harder to configure across regions but we will cover that. Like SNS, it is also quite fast, with alerts typically taking 30 seconds to 2 minutes.
- When StreamAlert receives data through one of those 3 channels it runs a Classifier Lambda function that manages the log message and then forwards it internally as needed. In the case of SNS each SNS message triggers the Lambda directly. This is how we can most-easily set up cross region and account access since we merely need to set our permissions and point the SNS topic to the Classifier Lambda. This permissions are the key, and in our example we use AWS Organizations which makes this work much more easily. Otherwise you need to make grants on a per-account ID basis which is time consuming, error prone, and may hit IAM policy limits in very large deployments.
- Most configuration is done through modification of the JSON configuration files, then deploying again using Terraform to push updates. Some configuration instead uses the python command line tool. The tool mostly ensures all the right changes are made in the configuration files.
For our example the following steps (taken from the training class) will send CloudTrail events via CloudWatch across accounts to StreamAlert. It’s obviously a little simpler if all this is in a single account.
- If you ended your previous session, make sure you type
- In the account where you have StreamAlert running you will need an SNS topic to send security alerts. You can use an existing one if you have it, otherwise you can create a new one. In our example we call it SecurityAlerts and we assume you already created it using the CLI or the console. This command will set it up as an output for alerts:
- Then enter SecurityAlerts at both prompts
- Next you need to create and configure the input SNS topic and subscriptions. This is the topic that will send data to the Classifier Lambda. In our example we are taking advantage of AWS Organizations to shortcut some permissions (since we can use your Organizations ID). Specifically, in training we create this subscription in our Organizations Root account where we have an Organization-wide CloudTrail trail configured. Thus this root collects all CloudTrail from all accounts and regions in our org and then sends the events to StreamAlert using a CloudWatch Rule we will demonstrate. You can use this same design in non-organizations environments, you just need to repeat the setup in each account.
- Our example is designed to create alerts as quickly as possible. Since CloudTrail only sends write events to CloudWatch Events this doesn’t collect read events . You can also collect read events using a CloudWatch Logs subscription to S3, which is a slower path.
- Since only read events are pushed the message volume (and thus costs) shouldn’t be too bad. If that’s a concern you will need to shift to using Kinesis, which we will explain in more detail in a future post.
- Log into your Organization root account, or whichever account you want to send logs from.
- Make sure CloudTrail is turned on, configured for all management events, and sends data to CloudWatch.
- Create a new SNS topic in the CloudTrail account which will forward to StreamAlert. We call ours cloudsec-streamalert-data.
- Adjust the SNS access policy using this template… you will need to replace the root account ID with the account ID from the account where the SNS is running and the organizations ID with your org ID. Also note this is all in us-west-2 and you’ll need to change that if you run it someplace else. This will allow the classifier Lambda in your monitoring account access to subscribe to the event feed.
- While you are still logged into this account let’s set up the CloudWatch rule to trigger the SNS and forward your CloudTrail data.
- In the console or CLI create a new rule to send the data to the data SNS topic you just created. The rule should us the following text and here’s a screenshot of it in the console:
- Now configure the cluster to accept the SNS data stream. Edit
~/streamalert/conf/clusters/prod.jsonand add the ARN of your data SNS to the input: sns section. It should look like this:
- Now update
~/streamalert/conf/clusters/sources.jsonwith the same ARN. The prior step added the SNS topic as an input; this change tells StreamAlert what kind of data to accept so it can parse it.
- Our last configuration step is to adjust one of the built-in rules to send alerts when it is triggered. Rules are written in python which makes it easy to build your own and have robust handling for different kinds of violations. You might have lower severity events log or create a ticket while higher severity events send text messages. In our case we will trigger our SecurityAlert SNS topic (in class we subscribe to this using email).
- Open the rule at
~/streamalert/rules/community/cloudtrail/cloudtrail_security_group_ingress_anywhere.pyand adjust the rule section to look like this:
- That’s it for configuration- we set up our alerting SNS, then the data SNS as input for the cluster, then configured CloudWatch as the source for parsing, then set the rule to output to our alerting SNS topic. Now it’s time to deploy everything:
You’ll need to type “yes” a few times but everything is up and running at this point. Since we used the Organizations ID in our data SNS topic that handles the cross-account permissions. When I first tried this is worked way smoother than I expected! One subtle advantage to using SNS is this also works well across regions. There is a lot more complexity to set up Kinesis across regions.
Our example here is just the start. In our next post in this series. we will discuss those Kinesis issues and provide sample Lambda code to consolidate across regions when using Kinesis. While DisruptOps doesn’t use StreamAlert internally we do use similar mechanisms to collect data. StreamAlert has actually been quite helpful to model the best techniques for moving event and log data between regions and accounts. As a multi tenant SaaS/PaaS application our needs are a bit more complex, especially around security, but the fundamentals still hold true.
And don’f forget we still have some spots available for our Black Hat 2019 class!