Automating Cloud Governance As You Go-Go
What does cloud governance have in common with George Michael? Sometimes it comes at you like – wham! (Thank you, I’ll see myself out.)
Governance. It’s a simple but potent word that triggers different mental models depending on the surrounding context. Ask Google to define governance and you get something like this:
Cloud Governance Complexity
Organizations look to the cloud to control cost and drive efficiencies. Cloud governance defines the rules of engagement to deliver a secure infrastructure as quickly and efficiently as possible. Although someone needs to implement those policies, and that requires people, and the fact is, people complicate things. Development teams want to build quickly, using the best cloud services (and providers) for the task at hand. Site Reliability Engineers (SREs) seek efficient ways to manage an ever-sprawling estate of cloud services, meeting performance and availability goals. Despite less control over the cloud environments, security teams must defend the organization and depending on the business needs, they must achieve and maintain specific compliance requirements. Alignment on these respective teams’ missions starts to drift as the operational reality of cloud sets in.
Strong cloud governance is an ideal state for everyone, but that doesn’t mean it’s easy to achieve. Focusing specifically on governance can overwhelm teams without providing the requisite value — which is why you are doing cloud in the first place! Strong cloud governance may be achieved more efficiently by focusing on resolving the critical issues at hand and adopting automation that reduces the work required to fix the next issue.
In other words, governance is a journey, not a destination. So it’s both inefficient and unrealistic to think you’ll have all of the answers at the beginning of the initiative. We believe you can “govern as you go,” and tailor priority actions to the needs of your teams.
You might be asking where to begin. Let’s start by breaking down cloud governance complexities and how to address them.
1 – Assessment
You can’t fix what you can’t find, so it’s no surprise that security teams focus on assessment. It isn’t sexy, but it’s a key first step to gain the visibility you need into your cloud estate. Assessments identify the current state of any given cloud service, but the findings must be enriched with context to know what – if any – action is required. Take a public S3 bucket, for example. An assessment may identify a high-risk finding if the bucket contains sensitive information intended only for a restricted audience. Or it could identify an acceptable state if the bucket houses a public-facing website. Assessments become powerful when they can be tailored to the needs of specific teams, people, and cloud environments. They become even more powerful when identified issues are incorporated into the workflow of each team and designed for the way different teams work.
2 – Compliance
Let’s keep this one short. Compliance reporting should be a byproduct of your cloud security operational model. If you have the right operational practices in place to achieve your governance policies, substantiating those practices is straightforward. Although it does require structuring your program intentionally and selecting tooling that can properly document your diverse environments.
3 – Operational Automation
Here is where assessment meets action. Performing the assessments is the easy part, and in fact, there are more assessment options available every day. AWS Security Hub is a great option from your favorite A to Z cloud provider. Want to go open source? Check out ElectricEye by our friend Jonathan Rau.
Operational excellence lies in the ability to use the assessment data to quickly address risks with as little friction as possible. This is where many commercial solutions and traditional cloud security posture management (CSPM) offerings fall short. Common risks, including public S3 Buckets or RDS Snapshots, or EC2 instances with 0.0.0.0/0 network access, require context to evaluate the true risk and determine if remediation is necessary, and if so, what path is best.
Real-time events are the same way. Five failed logins followed by a successful login could be an attacker or a forgetful Dev back from a holiday weekend. Until you delve deeper into the issue, you aren’t sure which. It cannot be overstated: Context is king.
Automated Priority Alerting
Automation is clearly an exciting opportunity to streamline operations, improve efficiency, and reduce error. But there are multiple aspects of automation to consider. During the deployment phase automation is great to effectively provision cloud resources, within DevOps workflows. But let’s consider operational automation, in the form of alerting. By automatically getting the right alerts (not everything) to the right people as they happen, with enriched context to reduce the effort and time to fix the issue, your teams will be able to take the right actions from the tools they’re already using. You can’t fix what you don’t know about, so the first step is delivering contextual alerts when and where they will have the most impact.
Automated Context & Response Options
‘Right’ will look different for different teams so flexibility and customization in how to respond is crucial. Some teams may fix an issue in a CloudFormation template. Active threats might require a change to the runtime environment. Regardless of the remediation approach, empower the right teams to select the best action.
Automation can then help by providing relevant context and remediation options by enriching the alert itself. The first CEO I worked for once told me “Know your audience.” Good advice. Because only these teams have the context and access to fix issues, eventually most cloud security issues ultimately end up with SREs, DevOps, or Developers. These alerts must speak to them in their language. They need to know why a Publicly Exposed KMS Key is an issue, what the current impact is, and what options are available to fix it. Then they can figure out how and when to address the issue. Integrating the context and solutions around the issue into a developer’s workflow – options that don’t require an hour of Googling and reading AWS docs – gets things fixed faster.
Cloud Governance As You Go-Go
Continuous cloud governance brings all these things together, right when you need them, to empower each member of the team to take meaningful action. Automation is key. Just as Google defines SREs as focusing 50% on operational work and 50% on developing automation for future operations, effective cloud governance should take a similar approach. Best practices drive visibility and feed an operational model tailored to each team, which can leverage automation as a tool to reduce workload, simplify workflow, and constantly improve the security, efficiency, and reliability of cloud environments. This whole process feeds into compliance visibility when you need it, instead of top-down compliance-driven actions that add overhead to cloud deployment and operations. .
Security, site reliability engineering, and development all have critical stakes in delivering effective cloud operations and governance. With smaller teams, one or two people may own the responsibility. As organizations grow, these responsibilities move and change quickly across dozens and even hundreds of mission-driven teams. If we target the end state of continuous governance for our cloud environments and coalesce around this mission, the daily actions taken by our team members become more integrated and intuitive, regardless of the size of the environment. As a result, critical issues get fixed faster, your cloud attack surface shrinks, and you should find less friction between all the teams that must work together to defend your cloud operations.