AWS Timestream & .NET - Underrated?
Transcript
Today we take a look at how to use AWS Timestream in .NET. Timestream is the managed time series database from AWS. It's fast and scalable with advanced features such as query scheduling.
But what is time series data? Time series data is simply a set of data points with a timestamp associated with them. They may contain one or more dimensions, which is essentially an attribute, so they can be easily grouped and filtered. One example of time series data is actually temperature. For any given moment in time, the temperature is always a specific value. Time series databases allow us to store and query this data in a really efficient manner. We can also query by specific dimensions such as location.
But what are the benefits of using a time series database over something like DynamoDB or SQL Server? Starting with compute, a modern database implementation such as DynamoDB or Timestream is based on a serverless platform. This means you don't have to worry about managing your own servers as you would with the traditional RDBMS. You only pay for how much you use.
One of Timestream's major benefits is to have multiple storage tiers out of the box. Initially it gets written into memory, and if you set a retention period it will move to a magnetic store. You can also set retention on the magnetic store so the data is permanently deleted. DynamoDB has the concept of row TTLs, but like SQL Server, it does not have multiple storage tiers.
When it comes to querying time-based data, most time series databases come with the ability to pre-aggregate data by some sort of internal scheduling mechanism. SQL Server has jobs and Timestream has scheduled queries. DynamoDB currently doesn't have any internal mechanism, so you'll need to use Lambda functions.
So let's take a look at how much AWS Timestream costs. This information is accurate at the time of recording, but your prices may differ depending on your company arrangements with AWS. With AWS Timestream, there are four different points of cost that we need to consider. These are read, write, memory store, and magnetic store. Writes are charged per 1 million writes of one kilobyte payloads at a cost of 50 cents. Queries are charged based on the amount of data scanned, and this is charged at one cent per gigabyte scanned. Memory storage is where the data first gets written and is designed for high throughput writes. The storage charges are based on per gigabyte stored per hour at a cost of 3.
6 cents. The magnetic store is optimized for long-term data storage and fast analytical queries. It's also charged based on per gigabyte stored per hour at a cost of 3 cents.
To create our first table in .NET, let's install the AWSSDK.TimestreamWrite and AWSSDK.TimestreamQuery packages. I'm not sure why AWS have split the read and writes into separate packages for this service, but we're going to need both.
Once this is installed, we can create the write client by instantiating the AmazonTimestreamWriteClient. We can check to see whether the database exists by calling ListDatabasesAsync and seeing whether our database is present. This method has a continuation token which needs to be used in a loop if you have lots of Timestream databases. I know that I only have one here, so I'm going to keep it simple for now. If our database is not present, we can create it.
We can use the ListTablesAsync method to check whether or not our table exists and create it if necessary. On the creation of the table, we can set the RetentionProperties. This allows us to set the duration for the data in the table kept in either in-memory storage or the magnetic store. We can also set MagneticStoreWriteProperties, which allows us to configure whether or not magnetic store is used and where records should go in S3 if the write to the magnetic store fails. For this demo, I'm going to use a one hour retention period in memory and disable magnetic writes.
AWS Timestream allows us to write two different types of records to a table. These are a single measure value record and a multi-measure value record. Single measure records allow you to write a single value to a single row. Multi-measure records allow you to write multiple values to the same row. This could be a cost saving for you, and it also allows you to migrate away from traditional RDBMS sources into AWS Timestream.
In this example, I'm going to use a multi-measure record to simulate some latency data from our website. In a loop, create a new instance of the type Record. Then we can set the following properties. The Dimensions property gives us the ability to query data based on specific attributes. Then we're going to want to give our measure a representative name and a time value.
By default, writes in Amazon Timestream follow the first-writer-win semantics. This is where data is stored as append-only and duplicate records are rejected. It does support upsert scenarios when the version number is set to a higher value than the version in the database. We also need to give our new measure a value type. For a multi-measure record, this needs to be set to MULTI, and then we can set the values. Each of the values can have a distinct type.
Moving on to the query side of things, we need to instantiate the AmazonTimestreamQueryClient. AWS, if you watch this, please migrate all of this into a single client and package. For those that already know basic SQL syntax, the query syntax is going to look pretty familiar to you and most stuff is likely to work out of the box. There are obviously some small differences in functions, which is all clarified by the AWS documentation.
The only mandatory parameter that we need to pass into the query method is the query string parameter. As with the list endpoints earlier, there is a token that comes back which, if it's not null, you'll need to iteratively call the API with the returned token. Since I'm only talking about a maximum of 10 rows here, I'm going to keep it simple for the tutorial.
The query that I'm going to use is a simple average of the latency of the records that we just inserted, placed into one minute buckets. If you need to, you can get the column information returned by the query. After that, it's just a case of iterating through the rows that have been returned and processing them.
One note to call out here is that you may not always have a scalar value like I have. You may have an array or a time series value. This all depends on your query. I've put a link in the description to the AWS sample which gives you a far better overview of the different types of queries you can run.
To improve the performance of your queries and potentially reduce some costs as well, you can run something called scheduled queries. These scheduled queries can be run as fast as every single minute. This means you can pre-aggregate your data or your reports and have a lot better performance at a lot lower cost.
The only real difference between what we're seeing currently and a scheduled query is you need to set up the additional S3 bucket, IAM role, and SNS topic that you're going to need in order to create the scheduled query. You'll also need to have a mapping between the query that you want to run and the resulting table. But everything else, like the query syntax, is exactly the same as what you've seen in this tutorial.
The last thing that we can do is clean up our database tables before removing the database. When you put it all together and run the code, it looks something a little bit like this. If you're lucky enough to have a LocalStack Pro license, then you'll be able to run AWS Timestream locally as well.
For me, getting this up and running in such a short amount of time and being able to use advanced features like scheduled queries, plus all of the scalability benefits of being a serverless platform, is what really makes this database stand out. Don't get me wrong, if you have extremely large workloads there may be better options out there for you, but I think for 90% of the cases this database is going to be absolutely perfect for you.
If you enjoyed this video, consider subscribing to the YouTube channel for more content like this.