Name: Create Terraform Modules Like A Pro
Uploaded: 2023-02-26T00:00:00Z
Description: Build reusable Terraform modules with embedded monitoring and alerting so developers fall into the pit of success

Transcript

Hi, my name is Stu and in this video I'm going to walk you through one approach you can use to upgrade your Terraform modules to save you the hassle of recreating. Today's code I've got a link down in my Patreon where you can download the code and many other things.

So what are Terraform modules? Well, simply put, Terraform modules are a collection of one or more Terraform files. This collection of files creates reusable components that we can use in many aspects of our infrastructure. Over the last four or five years I've noticed that companies are reusing Terraform modules a lot. They're either taking off-the-shelf Terraform modules that are open source on places like GitHub or they're building modules internally and then distributing them using private registries. But there's one thing that all of these companies generally miss when they're building their own Terraform modules, and that is monitoring and alerting.

There are two different approaches that teams usually take when they're adding monitoring and alerting with Terraform modules. The first approach is they embed all of the monitoring and alerting within the module themselves. Or the second approach is where they create a new Terraform module and put all of their monitoring and alerting in there. In this video we're going to cover both approaches and show you the differences within Terraform.

The approach that we should take with our Terraform modules largely depends on our requirements. It also depends on whether we have access to the resources or not. For example, we may have a resource that we create, but sometimes we may not have access to the resource directly so we need to append onto it, such as when we use a Terraform module that we get from a third party. The reason why I like to build in the alerting and monitoring into the Terraform modules is so that developers fall into the pit of success. If you've never heard of the phrase pit of success before, it's basically when developers struggle to fail because all of the infrastructure around them is setting them up for success. That said, there is definitely no one-size-fits-all whenever we're dealing with any kind of technology or approach.

So let's take a look at our first approach using a separate module. When we use a separate module we take most of the duplication out of our code. This means we can reuse the same set of alerts for different use cases. Let's take EC2 for example. We may have different scenarios where we need EC2 instances. Some of these instances may come from modules that we own, also from other Terraform modules. If we separate out our alerts into a different module then we can reuse this for all of the use cases.

However, there is one big problem with this approach. The problem basically relies around discoverability and remembering to implement the feature. Now it might be that your company has a great Terraform registry already and that's something you can leverage, but in my experience most companies are starting out in the infrastructure as code journey and they don't have the necessary tooling in place.

So let's switch across to Terraform quickly and let's have a look at how we're going to set up this approach. First, create a folder in your repository for your alerts, split out by the resource type. For example, I would set up an SQS alerts folder and a Kinesis alerts folder so they can easily be reused. Development is a lot like building Lego. You need to create small little reusable blocks to make a bigger picture. Once you have your alerts folder set up, we create a new module resource next to the resource for which we want to create the alerts and pass in the relevant details. In this example I'm using an SQS queue. So if we apply this Terraform and head over to the AWS console, we'll be able to see our two new queues and their corresponding alarms.

So now let's take a look at our second approach: embedding the alerts inside of the module. The second approach is what I would call the pit of success because we're co-locating the resource and its corresponding alerts all within one module. This means that anytime we pick up the module we get the alerts out of the box. Naturally you can add variables to control which alerts you do and do not want, but this then is a conscious decision for the developer to turn on or off the default features.

So let's go across to Terraform now and take a look at what we need to do with this approach. We're going to follow a similar format to the first approach but this time we will move the SQS queue creation inside of the module so it is co-located next to the alarm. Next we update the code in main.tf to remove the queue creation. The result is that we have deduplicated our code and we can quickly create as many queues as we want knowing that our alerts are going to automatically be in place. Naturally, if we apply this Terraform code and head over to the AWS console, we'll be able to see our two new queues and their corresponding alarms.

I generally find that a combination of the two approaches that I've mentioned here seems to be the best route to take. The reason for this is because we want the developers to fall into the pit of success, but we might not have the flexibility to use our own module all of the time. Therefore, if we can create the alerts module that is consumed by our parent SQS module, we get the best of both worlds.

Getting your alerts right can actually be a little tricky, so let's take a look at what makes a good alert. Firstly, it should be actionable. It should clearly indicate that an action needs to be taken by the respondee. Next, it needs to be timely so that the respondee can take action as close to the time of the alert as possible. It should also be accurate. False positives can lead to waking up at 2 AM and let's be honest, nobody really wants that. Alerts also need to be specific. We need to be clear about what's gone wrong, why it's gone wrong, and what we can do about it. Usually the "what we can do about it" is often a page in say a Confluence area that lists out step-by-step instructions of how to continue the investigation or how to remediate the alert. And lastly, it needs to be relevant. Only the people that can actually action the alert need to have the alert in their inboxes. Moreover, it should only be the people that are on the schedule that get the alert.

My teams and I always try and look at it from the perspective of what would we want at 2 AM. Looking at it from this perspective forces us to create a higher standard in our monitoring and alerting. Coupled with the techniques I've shown you in this video, you can really take your Terraform to the next level.

If you enjoyed this video, consider subscribing to the YouTube channel for more content like this.