The 4 Phases to Automating Cloud Management

By |2018-11-01T15:12:39+00:00October 30th, 2018|

A Security Pro’s Cloud Automation Journey

Catch me at a conference and the odds are you will overhear my saying “cloud security starts with architecture and ends with automation.” I quickly follow with how important it is to adopt a cloud native mindset, even when you’re bogged down with the realities of an ugly lift and shift before the data center contract ends and you turn the lights off. While that’s a nice quip, it doesn’t really capture anything about how I went from a meat and potatoes (firewall and patch management) kind of security pro to an architecture and automation and automation cloud native. Rather than preaching from the mount, I find it more useful to describe my personal journey and my technical realizations along the way. If you’re a security pro, or someone trying to up-skill a security pro for cloud, odds are you will end up on a very similar path.

Phase 1: Automating Configurations

For me it all started about nine years ago, when I was asked to build the first training program for the Cloud Security Alliance. Early on I realized we needed repeatable labs, which could run anyplace in the world, with both students and instructors possessing skills ranging from “developer” to “paper-pushing auditor”. In those days Amazon Web Services hadn’t really rolled out IAM and VPCs were private networks only. And concepts like Infrastructure as Code were just becoming feasible.

So there I was, trying to figure out how to build a hands-on application stack lab in the cloud for thousands of students. Consistently, *and* be able to update as AWS advanced their technology. At the time making your own AMIs was still a chore, but then I learned the wonders of `cloud-init`. A simple script I could host in an S3 bucket, with two small lines students could paste into the User Data field of their instances, which would configure the instances exactly as needed on launch. And when software updates broke things, I only needed to update that script at the published URL, and every new instance would use the new configuration — magic! While this wouldn’t help with patching anything running, it enabled me to maintain a good first-run experience, far more easily than updating and publishing new AMIs. And, in an act of utter reputational recklessnesses, you can still see the later version of one here on S3.

My first step was `cloud-init`. It isn’t something I use any more, but it was eye-opening that I could script an entire server and have it all run, using copy and paste and a single hosted file.

Phase 2: Automating Workflows

But the next step was far more impactful. After a couple years running hands-on trainings and building my own workloads, I started to play with the idea of Software Defined Security. Sitting in front of me was a cornucopia of cloud APIs, all whispering “call me” in my ears. I started looking for examples and found… nothing. Even Security Monkey wasn’t publicly released yet.

I had a class coming up for the Black Hat security conference, and I decided to use it as an excuse to learn Ruby and the AWS APIs (via the Ruby SDK). I ended up writing three demonstrations:

  • An incident response application that would quarantine an instance, analyze all its metadata, lock it down using AWS IAM, image all the storage, and launch a forensics analysis server ready to analyze the attached snapshots. This did in 3 seconds what used to take me 30 minutes.
  • A small app that would connect to AWS and Chef and identify all instances not running Chef (‘unmanaged’ servers). A process that could take weeks in a traditional datacenter.
  • Another app that would open security groups to a Qualys scanner, trigger a scan, and close the security group when it was done.

I hadn’t coded Ruby before so all three took about two months of part-time work to get up and running. They were pretty simple, but I learned some valuable lessons.

  • Managing credentials was critical, and also made it harder to share the code and get others to configure their environments correctly. Pulling from configuration files was… annoying. Especially for things like which security group in which region to use as the quarantine group.
  • Ruby on my local system worked fine, but then I would blow out service limits and had to insert delay timers when I ran the code in an instance in AWS. API service limits are not your friends.
  • All of these were really static. As slick as they were for demos, it still came down to manually running code from a desktop or instance. That hasn’t aged well.

I packaged these up as “SecuritySquirrel”, and you can find the 2014 versions on GitHub. Believe it or not, those aren’t even the originals I used for a couple years before posting.

Phase 3: Automating the Cloud Itself

When AWS released Rules for CloudWatch I slammed together enough Python code in about 2 hours the following Saturday morning to reverse any security group change within 10-15 seconds — including filters to scope defense based on tags, the VPC, or who requested the change. You can download the code and instructions, and unlike my Ruby code, this still works pretty well for 3-year-old cloud code.

Since that first demonstration I’ve built a library of event-driven automations running in Lambda, some of which you can download. In that package my favorite is `identify\_internet\_facing\_servers.py`, which, for demo purposes, I linked to trigger when I click an IoT version of an Amazon Dash button. That’s right, I carry around an actual physical Easy button in my pocket. It finds any instances with port 22 open to the Internet, and with a double-click of the button I can revoke the rules, getting a text message on my phone when everything is all safe and cozy.

My key lesson here was unexpected. It wasn’t that these event-driven automations replaced my host-based workflows, it’s that they served a different purpose. I realized I had moved on from building workflows to help me do things more quickly, to building guardrails to keep things safe in the background. Both have incredible value.

Phase 4: Automating Everything

My most recent work has been on using Jenkins and Infrastructure as Code (mostly CloudFormation) to enhance security. This combination enables me to automate security into the infrastructure and applications themselves and to rely less on external tools.

For example, I released a simple credentials scanner to run in Jenkins and find any stored access keys before even starting the build. Why wait and try to suss them out later? I then wrote some other test harnesses to allow me to run basically any assessment tool I want in Jenkins, and fail builds when they fail any security test, like a network scan (pro tip: Jenkins will fail a build if you send it any exit code other than 0 from a script).

Coming back around, we now run the training class using CloudFormation templates to build out all the elements of the application stack so students can focus on adding security. We went from building consistent training servers to consistent training environments, with custom AMIs we can update in minutes… globally… with very little effort, and all software pre-installed and ready for final configuration.

My cloud journey started around nine years ago and my automation journey nearly at the same time. I started by building things and then trying to automate pieces, but now start with the assumption of automation. My earliest work was about operations, but these days it is nearly all focused on security. On melting away the operational overhead and allowing the security parts of my mind to focus on what they are best at. Along the way I’ve also learned that not all automation is created equal; that there are places for guardrails, workflows, cross-platform orchestration, infrastructure as code, and automating pipelines. All of these now deliver nearly unimaginable security benefits, but we are still very much in the early days, when you can lose a week just reversing a poorly documented API.

If you are in security, time to get your code game on. If you are a developer or ops, time to get your security game on. Because the biggest lesson of all is that the days of security as an umbrella are over, and the days of security in the fabric are here.

About the Author:

With twenty years of experience in information security, physical security, and risk management, Rich is one of the foremost experts on cloud security, having driven development of the Cloud Security Alliance’s V4 Guidance and the associated CCSK training curriculum. In addition to his role at D-OPS, Rich currently serves as Analyst & CEO of Securosis.