Do you work with infrastructure code and find yourself asking, “what would happen if we made this change?” Deriving meaning from looking at code alone is challenging – let’s visualize it!
In the early days of the cloud, Amazon famously coined the term “2 Pizza Teams”, meaning that teams should have no more people than can be fed with 2 pizzas. This model permits rapid development, but it can lead to implementation differences. For AWS, a confusing difference between services can be with their IAM default behaviors. These differences are implicit and poorly documented, leading to potential complications.
To demonstrate a common permissions scenario, let’s introduce A Team Called Quest (ATCQ) – where the cloud is their native tongue. With strict compliance guidelines constantly requiring added controls and reports, the InfoSec team regularly audits the AWS environments using a variety of scanning tools. When something needs to be changed, the InfoSec team participates in the review process – any time there’s confusion, the process breaks down, eating up valuable time and energy.
Let’s check in with the team at ATCQ to ask “what if?”
During a recent audit of our production AWS accounts, the InfoSec team at ATCQ identified an AWS ECR registry that the Support team had full admin rights to. Given that this registry is serving container images for production workloads, the InfoSec team determined that only select members of the Dev team should be able to interact with the ECR.
The Operations team has been tasked with implementing an ECR resource policy to remove all access for the Support team, but they aren’t sure what is the best method to do so:
- Via the ECR resource itself as a resource policy
- Via the KMS key that’s used to encrypt the ECR
You can find the initial commit on the PR here.
First, we add a deny statement to the KMS key used to encrypt the ECR images:
Then we deny access to the ECR itself:
Can you tell which change will fail to apply because it blocks the root user and all our admins?
The Old Way
Though the Terraform change appears to be valid, each service has varying default permit/deny behavior. The ECR change will work as expected, blocking only the 3 support staff users we intended to block. However, the KMS change works differently - adding the exact same deny stanza for these three users overrides an implicit account-wide permit policy, and would block all principals across the entire account if applied.
Because this behavior is implicit, it’s difficult to predict what would happen. There are minimal clues in Terraform’s config or plan that these two services work differently. To investigate further and confirm, a PR reviewer would need to read each service’s AWS IAM documentation page and discern the default IAM policy and behavior.
The New Way - Visualizing IAM
Using a visualization technique, we can better understand each service’s default allow/deny behavior, helping us make a more informed decision about the proposed change. What we’ll uncover is that ECR and KMS behave differently due to their defaults – not something we could have known from looking at the code diff.
Let’s start with the initial state of the environment before applying any changes – we can see that members of the Support team have admin access to KMS and ECR. Their permissions are shared with members of the Admin Group through the same set of applicable policies.
Iterating over the IAM changes from the pull request, the access vector for the Support team now differs from the Admin Group – KMS access is blocked for both, but ECR for just one. That’s interesting… let’s push on.
Calculating the changes, this deduplicated permissions view shows what would change if applied.
That’s not right. While the outcome for the Support team is intended, blocking KMS access for the Admin Group is not. Why is that? As it turns out, the added deny statement for KMS overrides an implicit account-wide allow policy, thus blocking ALL Principals across the entire account if applied. Yikes.
We can now add the missing default statement on the KMS resource policy and update our PR based on what we learned from this visualization technique.
The Evaluated Changeset view is now much healthier, only changing the access vectors for the Support team.
Expanding this out to the final outcome once applied, we can see that we’re blocking the Support team’s access as intended without impacting the Admin Group.
Using this visualization technique, the Operations team was able to see that the change to the KMS policy was wrong, and was able to quickly fix the problem before the code was deployed.
Handling inconsistencies within AWS IAM can be confusing for teams responsible for security and operations. Visualization techniques can better surface the nuances and make it obvious when a change will lock out users, even when Terraform itself isn’t clear on what the impact of these changes is.