~/codewithstu

The fastest .NET Serializer is NOT what you think

Transcript

Serialization is a key part of modern applications. We use it everywhere from HTTP requests to storage and databases. The performance of any serializer is key. Today we're going to look at some of the most popular serialization frameworks.

Before we get started, let's go over a few notes to ensure that we're all on the same page about what to expect. All of the benchmarks are available on GitHub in the link in the description. I will keep these up to date and accept pull requests for fixes, new libraries, and scenarios. If the library you want to see isn't listed, feel free to add it. My initial criteria was to get running with the serializers in under 5 minutes, 10 minutes maximum, with NuGet packages only for the most part. After all, I am a single person and I'm pretty busy. Also keep in mind that some of these frameworks are brand new to me, like Bebop for example.

There is no such thing as the best serializer, as each one has different trade-offs and I can't know all of your requirements. So in this video I'll present the data that I have for the scenario that I've got, and then let you be the judge of which serializer you think is best. Of course, depending on your scenario, your mileage will vary. We will compare JSON serializers and binary serializers separately. All of the benchmarks will be running on BenchmarkDotNet,

running on .NET 7 Preview 7. Of course we'll be using all the latest versions of all the packages.

With that out of the way, let's take a look at the setup we will use. Every serializer framework will need to process the same payload with three different sizes. The small size is roughly 170 kilobytes of data, the medium size is about four megabytes of data, and the large size is about 17 megabytes of data. The payload is a list of users, with each user having a set of orders generated using Bogus.

We will compare both the serialization and deserialization performance of all the frameworks before taking a look at the size of the payloads that it generates. So let's start off with the JSON serialization frameworks. We will look at Jil, Newtonsoft.Json, ServiceStack.Text, SpanJson, System.Text.Json, and Utf8Json. For the binary serializers, we're going to take a look at Avro Convert, Bebop, BSON in the MongoDB implementation, GroBuf, Hyperion, MessagePack, MsgPack, and protobuf-net.

So let's take a quick tour around our project. Inside of our Program.cs file, we have a couple of different helpers: GenerateDataSets, VerifySerializedSizes, and the benchmark runner itself. The VerifySerializedSizes function dumps out the serialized payload sizes to the screen in a pretty naive way, but it's also pretty handy for our comparison. We'll see the results of this later.

GenerateDataSets is pretty much a carbon copy from the Bogus .NET readme file with a few tweaks to help us generate some larger data sets. As for our benchmarks, we have an abstract base class that contains a common set of attributes at the class level, from which JsonBenchmark and BinaryBenchmark will inherit.

The JsonBenchmark class contains a GenerateDataSets helper that reads from the data files in the project folder. I've put some switches into the .csproj so we can control which data files we are loading for our tests in a bit easier manner. The exact same setup is used in the BinaryBenchmark class.

For our models, we have three files that are used for the majority of the tests: User, Order, and Gender. For the Bebop serializer, I needed to include a separate definition and generate the classes for it. I also needed to include a wrapper class so that I could easily serialize the list of users. Lastly, we have the data JSON files themselves. These contain all of the payloads used by the tests through the GenerateDataSets method in both the JsonBenchmark and BinaryBenchmark classes.

So now I'm going to pause the recording for a bit whilst I run through both sets of benchmarks, and we will walk through the results afterwards. The full result log will be available inside of the GitHub repository for those that are curious.

The first benchmark results that we're going to look at are the JSON benchmarks. On a quick look at the deserialization results, it appears that SpanJson is the quickest across all of the data sets, although Jil and System.

Text.Json appear to be leading by quite a way when it comes to memory usage. On the serialization side of things, SpanJson is again the fastest library, but this time its memory usage is massively improved.

Now let's take a look at the binary serializer results starting with deserialization. We can see that we have two libraries sitting pretty at the top of the tree: GroBuf and Bebop. I've not heard of GroBuf before this experiment, so I was honestly expecting it to sit somewhere near the bottom, definitely not to be first. It also appears to be really good at handling memory as well.

When it comes to serialization, GroBuf is again on the top spot. This time MessagePack takes second place and does so using a lot less memory. One of the most interesting things to know about the binary benchmarks is that the MongoDB BSON driver is actually really slow. To put this into perspective, protobuf-net, MessagePack, Bebop, and GroBuf are all faster to process a payload four times the size in the same amount of time. I would have thought that MongoDB would have been quite a bit quicker given how widely it's used.

Lastly, let's explore the sizes of all the formats, starting with the binary serializers. For the small payload, it's no surprise to me to see MsgPack, protobuf-net, and Avro Convert at the top of the list. And to be honest, there's not a lot between them here. Bebop and GroBuf are probably the most surprising. They obviously perform very well on the performance side of things, but there is a substantial trade-off for the size of the serialized object. As we would expect, for the medium and large payloads the story is very much the same, but I have to say the overall compression ratios for all of them are pretty good.

So let's take a look at the JSON serializer sizes. You would think that this would be the same for all the different frameworks. Topping the charts in the small category we have ServiceStack, SpanJson, and Newtonsoft.

Json. It's pretty clear that Utf8Json and Jil obviously do things very similarly given that the output is of identical length. Again, as we would expect, the medium and large category stories are repeated as expected.

After looking into the reason why there was such a difference between the serializers on the JSON side, I found out that it's because at mostly default settings, whether the library serialized nulls came into the equation. For example, some libraries serialize nulls by default and others don't. The way that libraries also handled GUIDs became a factor. For example, whether or not the hyphen separator was included in the GUID string.

So based on this data that you've seen, what serializer would you go for? Let me know in the comments.

If you enjoyed this video, consider subscribing to the YouTube channel for more content like this.

// share_this