The accidental (or deliberate) exposure of sensitive data on Amazon S3 is one of those deceptively complex issues. On the surface it seems entirely simple to avoid, yet despite wide awareness we see a constant stream of public exposures and embarrassments, combined with a healthy dollop of misunderstanding and victim blaming.
Personally I’m as guilty as anyone else for underestimating the problem. When it first started cropping up in the news I assumed it was more-often the result of carelessness. Initially it seemed this was also Amazon’s response; as the company applied a series of iterative enhancements that could be incredibly effective, but still failed to stem the tide. But as we continue to see exposures we also continue to understand the subtle complexities that complicate their elimination.
We will describe how S3 becomes such a big security issue and why management is complex. In the following post we focus on how to reduce the risk of an exposure, and what to expect in the coming months as Amazon tackles the problem more directly. For future posts we will go deeper into the technical aspects, recommend remediation options, and release some sample code to help jump start your defensive efforts.
What’s going on with Amazon S3 security?
Launched in 2006, Amazon Web Services Simple Storage Service (S3) is one of Amazon’s oldest cloud services and is deeply embedded throughout the Internet. Over the past year or so a series of public exposures and disclosures have revealed large amounts of sensitive data stored in S3. The vast majority of disclosures appear accidental, but despite wide recognition of the problem and multiple techniques to identify and secure the data the exposures continue.
Amazon S3 has not been hacked or compromised; every exposure can be traced back to a misconfiguration.
So S3 is secure?
Yes. S3 is extremely secure. Everything is private by default and can only become public if you open it up. If anyone tells you S3 was “hacked” they likely don’t know what they are talking about. Amazon has one of the best security teams in the business, hires a roving list of top-tier penetration testers, and considers AWS security a top 2 priority. In my nearly 10 years experience on AWS I’ve seen nothing to indicate otherwise.
Then why are there so many S3 data exposures?
S3 is a file storage and sharing service. It consists of buckets, which are the top-level storage container. All buckets are potentially internet accessible and have a URL associated with them, even when they are private. For example, disruptops.s3.amazonaws.com. That URL only allows access if you open up security rules to make it public. Buckets hold the actual objects (files) and also support directories.
S3 serves a lot of different purposes, and over the 12 years it’s been around has added multiple mechanisms for hosting and sharing files. S3 can even host entire dynamic websites without needing a server. It’s the complex interplay of these options that results in public exposures. Until very recently you had to know to check five or so options to determine if a bucket was actually public, and even an expert could miss certain edge cases.
S3 buckets and objects start out completely secure, and only become public as the rules are changed to allow external access. This is common as it is used for things like hosting web pages or data sets for applications. Ideally dev and ops only open up granular access, but based on our assessment experience it’s also pretty common for them to open things up too wide without realizing the consequences.
Why allow S3 use at all?
S3 is WAY more secure than nearly any other place you keep your files. It’s also a powerful enabler of applications and web services, and if you use AWS at all it is likely already deeply embedded into your technologies and processes.
Amazon’s S3 security itself has no known breaches!
It is a tool. You just need to know how to use it safely, and by default data in S3 is secure. The problem is that data is then opened up and potentially becomes public as it is used within and outside the organization, most often due to misconfiguration.
If you block S3, the odds are you’ll create more problems than you create by forcing users and developers to use less inherently-secure options.
How do S3 buckets even possibly become public?
As mentioned, S3 is designed for storing and sharing data. Heck, it was the backbone for all of Dropbox until recently. The ability to share any data in S3 is a core feature of S3. Every bucket and object has an Internet routable URL, even when it is private (access is just denied on private objects).
S3 supports a bunch of different mechanisms to manage sharing and support all the different ways it can be used. These mechanisms can apply at either the bucket or the object level, and policies can govern use within AWS and with the Internet at large. We will review the mechanisms in our next post, but the two main ones are:
- Access Control Lists (Bucket ACLs). This is the first place people go to make content public, and it’s as simple as clicking “Everyone” for read and/or write access.
- Bucket Policies. ACLs aren’t very granular, so AWS added bucket policies which allow you to restrict access only to approved IP addresses, or add other kinds of conditionals.
Thus you could have an ACL that denies public access, but a bucket policy that allows it (via an IP-based rule). Someone looking at the console or the ACLs via command line or API could see one, but miss the other.
This is a pretty big simplification and only one example. Depending on how you like to count things there can be up to 7 different places you set S3 rules, all of which affect each other. Amazon tends to always drop to default deny, but there are still plenty of ways to misconfigure things.