Name: Unlocking the best of AWS Route 53
Uploaded: 2023-02-26T00:00:00Z
Description: Master AWS Route 53 wildcards, health checks, and multi-region DNS failover with Terraform for highly available infrastructure

Transcript

Hi, my name is Stu and in this video you're going to learn about different techniques in Route 53 such as wildcards, health checks, and a multi-region DNS setup.

The first technique we're going to look at is wildcards. To save you time creating today's code, I have this available along with a lot of other things in my Patreon account. Link in the description below.

A wildcard subdomain is a DNS record that will match any request for a subdomain. For example, if we had a DNS record called *.example.com, then we can type test.example.com or anything.example.com and it would resolve to the DNS record that represents our wildcard.

There are several benefits of using a wildcard subdomain with AWS Route 53. The first is simplicity. By using a wildcard domain, we can route all of our traffic to a single place using a single DNS entry. The next benefit is flexibility. Wildcard subdomains allow us to develop a catch-all DNS record that can handle a request for a subdomain that may not actually exist. For example, domains that are generated dynamically or testing domains. Lastly, we can use wildcard subdomains for consistency. This means we can handle all of our DNS entries for a subdomain consistently across our entire space. This can allow us to reduce the total number of DNS entries that we have in Route 53.

So let's go and have a look at how we can do this in Terraform. To create a wildcard domain in Terraform, we use the aws_route53_record resource. In the name field, we use an asterisk to represent the subdomain. This is what creates our wildcard. If we then create another record within the subdomain such as test.example.com, then the named record will take precedence over the wildcard domain.

So let's go ahead and quickly provision these records. Now let's test the wildcard domain with something that doesn't exist. As you can see, we now get the record 8.8.8.8. And if we look up test, then we get the record 1.1.1.1.

The next thing we're going to look at is AWS Route 53 health checks. AWS Route 53 health checks allow us to monitor different resources such as EC2 instances or databases. It also helps us to ensure that these resources are working as we would expect them to work. If a health check fails, Route 53 can automatically move all of your traffic away from the unhealthy resource to another healthy resource. This allows you to have resilient systems based purely off of a DNS health check. This helps ensure that your users can continually access your system even when things go wrong.

There are two types of health check in AWS Route 53: simple and calculated. A simple health check simply monitors a single resource, whereas a calculated health check monitors multiple resources such as a database and a web server. You can use Route 53 health checks to monitor resources in different regions, allowing us to build highly available systems that can migrate traffic across regions should a specific region fail. We can also configure alerts to be sent when the status of our resource changes, allowing us to troubleshoot issues quickly.

Here are some of the key benefits of using AWS Route 53 health checks. First, we have improved availability. Route 53 can monitor your resources and make sure they're responding as we expect. If a health check fails, Route 53 can automatically move away from the unhealthy resource to a healthy resource in the same or different regions. And that gives us the next benefit of increased reliability. By continually monitoring your resources, Route 53 can identify issues quickly and helps prevent issues becoming widespread, thus preventing downtime. When issues do occur, Route 53 allows us to specify the different routing paths for healthy and unhealthy resources. For example, when a resource becomes unhealthy we may want to serve a static page. As we've mentioned before, we can also get alerts and notifications that help us quickly identify issues. And the last benefit we can get from AWS Route 53 health checks is cost savings. By migrating the traffic automatically, we can prevent additional resources being spun up in affected areas. This helps us reduce our total cost.

So now let's go over to Terraform and take a look at how we can create a couple of Route 53 health checks, starting with a basic HTTP check. To set up these checks in Terraform, we need to use the aws_route53_health_check resource. To have a simple HTTP check, we need to specify the fully qualified domain name or FQDN, and some basic HTTP details such as the path and request interval.

The next type of check that we will create is an aggregated check, which will check other Route 53 health checks and provide us with aggregated results. In this example we set the type to "calculated" and then set two properties: child_health_threshold, which is the minimum number of health checks that must be healthy for Route 53 to consider this check healthy overall, and child_health_checks, which is a list of the other Route 53 health checks that we want to aggregate.

The last health check we will create will be for a CloudWatch alarm. First let's make a basic CloudWatch alarm that monitors the average CPU utilization for an EC2 instance. Once created, we can create the aws_route53_health_check resource. This time we need to set the type to "cloudwatch_metric". Then we can set the CloudWatch alarm name and CloudWatch alarm region properties.

There are other types of Route 53 health checks that you can create, such as TCP checks with a database, so I've left a link in the description for your reading afterwards. If we go ahead and apply these Terraform changes, we'll be able to see in the AWS console that our health checks are now present and ready for us to consume.

The last thing that I'm going to show you is how to create multi-region resiliency. Multi-region resiliency refers to the ability of our applications to failover between different regions. The goal of multi-region resiliency is to minimize downtime in the event of a disaster and failover resources from one region to another.

The technique that I'm about to show you is based on an old video by "This Is My Architecture" from AWS, where Netflix demonstrate how they have their multi-region DNS setup so they can use weighted traffic rules and latency policies to redirect traffic through different regions. If you're interested in this video as well, I'll put a link to this in the description below.

In order to have this multi-resilient DNS setup, we need to have three layers. The first layer, or the lowest layer, is the regional records. The layer above that, or the second layer, is the continental records. And then finally we have our global records.

Regional records map one-to-one with an AWS region. For example, you'll have an EU-1 record that represents eu-west-1. When we're in multiple regions, we combine this into a continental record such as EU. Then the traffic can be routed to either eu-west-1, eu-west-2, or any other European region that we want. We will do this using a weighted routing policy. Then for every continental record, we will create a global record that corresponds to it, and then we will load balance using a latency routing policy.

Don't worry, we're going to go through this step by step. So let's go across to Terraform and see how this is all laid out. We are going to use the aws_route53_record. I'm going to create two regional records in the EU region that mimic us being in eu-west-1 and eu-west-2. I name these records whatever I'm going to call my DNS, and then hyphen EU to represent the continent, and then a number just to represent which record I'm creating. So for me these are going to be root-eu-1 and root-eu-2.

For completeness, in this example I'm just going to use different records so we can distinguish the different regions in an nslookup later.

Next we have the continental records. Here I'm using the weighted routing policy to specify a 50/50 split between EU-1 and EU-2. A couple of important things to note as we do this. The first is I set the type to CNAME, not an A record or an alias record as we would do in the regional records. What happens when we use a CNAME is the DNS resolver will recursively look down the chain of DNS records until we get to the IP address that we actually need.

Now because we're using the weighted routing policy, we need to specify a continental record for every regional record that we have. This means we also need to use the set identifier. Each one of these set identifiers must be unique. To make it work with the other continental records, we can reuse the same name. So for me I'm using the root-eu name for both my continental records. What happens then is as we request root-eu.whatever-our-domain-is, then 50% of the time we will get the EU-1 region and 50% of the time will get the EU-2 region.

I've also replicated the same regional and continental setup in the US, replacing EU with US in the relevant parts.

Lastly, we need to create the last layer which is the global record. So for us this will be root.

example.com. Again, we need to create multiple records depending on the number of regions that we are in. So because I am in two regions, EU and the US, I need to create two global records. This is because we are setting the latency routing policy. Again, I need to use the set identifier to distinguish between the two different types of records. But as with the continental example, we can reuse the same name property.

So now let's apply this Terraform and take a look at what happens inside of the Route 53 console. As you can see, all of our records have been created and now we're ready to test. If we type into our console nslookup root and then the domain name that we've used, you'll be able to see that it recursively resolves down to the IP address that we want. We can also specify the continental records directly. So for example, if I request root-us.example.com, then I get the region that is going to be serving my US traffic. In my code I've set the US Region 2 to have a hundred percent of the traffic, so this is the record that I am always going to get.

In my example here I've used both the weighted routing policy and the latency routing policy. What might be more appropriate for your setup is to use the weighted routing policy for all the layers. Then you can transfer the traffic at will between EU and the US, and within specific continental regions. For example, you might want to move traffic from eu-west-1 completely over to eu-west-2.

Let me know in the comments below which parts of this video you found to be most helpful.

If you enjoyed this video, consider subscribing to the YouTube channel for more content like this.