Diagnostics with EventCounters in .NET

Recently, I've been playing with the new diagnostic improvements in .Net Core 3. Traditionally, I've always used the great AppMetrics package to capture the metrics from our applications and send scrape them with a Prometheus & Grafana setup. Whilst reading about the improvements, I wondered whether or not it would be possible to push metrics to Prometheus.

Ultimately, I decided that pushing to Prometheus wasn't ideal for my use case. However, I have successfully used the approach described in the rest of the article to push the metrics to another platform, using a new .Net API - EventCounters.

EventCounters are the .NET Core replacement for Windows performance counters, which are now cross platform. EventCounters are based on the EventPipe that was originally introduced in .Net Core 2.2, but .Net Core 3.0+ adds a lot of additional functionality that we can use going forward to create cross platform monitoring tools for our applications including:

dotnet-dump takes memory snapshot and allow analysis based on most SOS commands;
dotnet-trace collects events emitted by the Core CLR and generate trace file to be analyzed with PerfView;
dotnet-counters collects the metrics corresponding to some performance counters that used to be exposed by the .NET Framework.

Please note that this article is correct at the time of writing based on the sources available. I do describe some of the internal workings of the API, which may change overtime.

Application Flow

In order to use the new EventCounters API, you first need to create an inherited class that derives from EventSource because every type of counter needs to be registered against an EventSource. Let's start off with the simplist possible EventSource that records metrics dynamically:

[EventSource(Name = "MyApplication")]
public class MyApplicationEventSource : EventSource
{
    public static MyApplicationEventSource Instance = new MyApplicationEventSource();
    private readonly ConcurrentDictionary<string, EventCounter> _dynamicCounters = new ConcurrentDictionary<string, EventCounter>();
 
    private MyApplicationEventSource() {}
 
    public void RecordMetric(string name, float value)
    {
        if (string.IsNullOrWhiteSpace(name)) return;
 
        var counter = _dynamicCounters.GetOrAdd(name, key => new EventCounter(key, this));
        counter.WriteMetric(value);
    }
}

In order to initialize a new EventCounter instance, we need to give a name and the EventSource that it should be associated with. Whilst this is okay for simple EventCounters, we often need to do more with our applications, such as tracking the start/stopping of certain events, or tracking activities using PerfView. To do this, we can leverage more of the EventSource's infrastructure.

Using EventCounters And EventSource Events

Let's breakdown the following example, which I've taken from my OpenMessage project:

[EventSource(Name = "OpenMessage")]
internal class OpenMessageEventSource : EventSource
{
    internal static readonly OpenMessageEventSource Instance = new OpenMessageEventSource();
 
    private long _inflightMessages = 0;
    private long _processedCount = 0;
    private IncrementingPollingCounter _inflightMessagesCounter;
    private EventCounter _messageDurationCounter;
    private IncrementingPollingCounter _processedCountCounter;
 
    private OpenMessageEventSource() { }
 
    [NonEvent]
    public ValueStopwatch? ProcessMessageStart()
    {
        if (!IsEnabled()) return null;
 
        MessageStart();
 
        return ValueStopwatch.StartNew();
    }
 
    [Event(1, Level = EventLevel.Informational, Message = "Consumed Message")]
    private void MessageStart()
    {
        Interlocked.Increment(ref _inflightMessages);
        Interlocked.Increment(ref _processedCount);
    }
 
    [NonEvent]
    public void ProcessMessageStop(ValueStopwatch stopwatch)
    {
        if (!IsEnabled()) return;
 
        MessageStop(stopwatch.IsActive ? stopwatch.GetElapsedTime().TotalMilliseconds : 0.0);
    }
 
    [Event(2, Level = EventLevel.Informational, Message = "Message Completed")]
    private void MessageStop(double duration)
    {
        Interlocked.Decrement(ref _inflightMessages);
        _messageDurationCounter.WriteMetric(duration);
    }
 
    protected override void OnEventCommand(EventCommandEventArgs command)
    {
        if (command.Command == EventCommand.Enable)
        {
            _inflightMessagesCounter ??= new IncrementingPollingCounter("inflight-messages", this, () => _inflightMessages)
            {
                DisplayName = "Inflight Messages",
                DisplayUnits = "Messages"
            };
            _messageDurationCounter ??= new EventCounter("message-duration", this)
            {
                DisplayName = "Average Message Duration",
                DisplayUnits = "ms"
            };
            _processedCountCounter ??= new IncrementingPollingCounter("processed-count", this, () => _processedCount)
            {
                DisplayName = "Messages Processed",
                DisplayRateTimeScale = TimeSpan.FromSeconds(1)
            };
        }
    }
 
    // ... code omitted for brevity
}

The example above is designed to track the number of messages processed by our system, and how long on average they took to process. The event source is also designed to be lazily initialized, so we only track information when the EventSource is enabled. Let's take a look at how we've accomplished this by looking at OnEventCommand:

protected override void OnEventCommand(EventCommandEventArgs command)
{
    if (command.Command == EventCommand.Enable)
    {
        _inflightMessagesCounter ??= new IncrementingPollingCounter("inflight-messages", this, () => _inflightMessages)
        {
            DisplayName = "Inflight Messages",
            DisplayUnits = "Messages"
        };
        _messageDurationCounter ??= new EventCounter("message-duration", this)
        {
            DisplayName = "Average Message Duration",
            DisplayUnits = "ms"
        };
        _processedCountCounter ??= new IncrementingPollingCounter("processed-count", this, () => _processedCount)
        {
            DisplayName = "Messages Processed",
            DisplayRateTimeScale = TimeSpan.FromSeconds(1)
        };
    }
}

This is where we register the event counters that we are interested in tracking. EventSource's can receive commands from external sources, so that they can enable the EventCounter API etc. We can receive this message from applications multiple times, so it's important to to make sure that we defensively programme. In the sample above, I use the new null-assignment expression to ensure that only when the field is null, do we perform the expression on the right hand side - which in our case is creating the counters.

There are four available types of counters available for us to use, which I will cover later on:

EventCounter
IncrementingEventCounter
PollingCounter
IncrementingPollingCounter

Next, we need to look how we can actually record the metrics. In order to do this, I've combined it with using EventSource Event's so that I can also get the information that I want inside of other tools like PerfView should I want to:

[NonEvent]
public ValueStopwatch? ProcessMessageStart()
{
    if (!IsEnabled()) return null;
 
    MessageStart();
 
    return ValueStopwatch.StartNew();
}
 
[Event(1, Level = EventLevel.Informational, Message = "Consumed Message")]
private void MessageStart()
{
    Interlocked.Increment(ref _inflightMessages);
    Interlocked.Increment(ref _processedCount);
}
 
[NonEvent]
public void ProcessMessageStop(ValueStopwatch stopwatch)
{
    if (!IsEnabled()) return;
 
    MessageStop(stopwatch.IsActive ? stopwatch.GetElapsedTime().TotalMilliseconds : 0.0);
}
 
[Event(2, Level = EventLevel.Informational, Message = "Message Completed")]
private void MessageStop(double duration)
{
    Interlocked.Decrement(ref _inflightMessages);
    _messageDurationCounter.WriteMetric(duration);
}

We have two operations that we are really interested in Start & Stop. In the example above, each of the operations is split out into a [NonEvent] and a corresponding [Event]. The [Event] is what the EventSource system uses to write the events to the underlying stream so that it can be picked up by tools such as PerfView. The entry point is always the [NonEvent] so that we can check to see if anyone is listening to the EventSource before we do anything, this helps ensure that it does not emit the Event unnecessarily. This is the same pattern that is used throughout the .Net Code base from what I can tell.

For the [Event]'s, you will notice that the Start/Stop is EventId 1/2 respectively and the also end with Start/Stop. This allows some magic to happen such as automatically figuring out the duration inside of PerfView. For more information on some of the magic that occurs, I strongly recommend reading Vance Morrison's Excellent Blog Post instead of me duplicating the knowledge here.

Once you have your EventSource configured, and you know which metrics you wish to track, then all that's left is to start recording your metrics (eg: OpenMessageEventSource.ProcessMessageStart()) and the runtime will take care of the rest.

Other EventSource Examples

For some inspiration of how to configure your EventSource's, here are a few examples from Microsoft:

HostingEventSource: Used to track the current number of requests including: failed/total/requests per second.
KestrelEventSource: Used to track details of connections to the Kestrel WebServer - including when connections and requests Start/Stop.
ConcurrencyLimiterEventSource: Used to track the number of queued requests and the duration in the queue.

Types of DiagnosticCounters

The DiagnosticCounter class is the abstract base class that all of the event counters types inherit from. Currently, there are four implementations registered in the source: EventCounter, IncrementingEventCounter, PollingCounter and IncrementingPollingCounter. Although abstract, we can't really inherit from DiagnosticCounter as the internal components that we need, which are described below, are protected from external use. The four implementations that I mentioned, appear to cover pretty much every use case that I can think of anyway.

EventCounter

This type of event counter is typically used for tracking latency of requests to external parties due to the aggregated stats that this type provides. An EventCounter instance tracks the following about the metrics that it's recorded:

Name	Type	Notes
Name	string
DisplayName	string
Mean	double	The average of all values recorded
StandardDeviation	double
Count	int	How many metric entries were recorded in this iteration
Min	double
Max	double
IntervalSec	float
CounterType	string	Always "Mean"
Metadata	string	Any associated metadata for this specific counter
DisplayUnits	string
Series	string	Format is: $"IntervalSec="

In order to write data, you need to call <counter>.WriteMetric(value).

IncrementingEventCounter

An IncrementingEventCounter is typically used to track ever increasing numbers such as the total number of requests. Unlike it's namesake, EventCounter, this class does not provide any statistics about the data. In other words, it is a pure counter, so only the following information is tracked:

Name	Type	Notes
Name	string
DisplayName	string
DisplayRateTimeScale	string	The unit of measure that the metric should be shown in, eg: per-second
Increment	double	The value of the this is: currentValue - previousValue
IntervalSec	float
Metadata	string
Series	string	Format is: $"IntervalSec="
CounterType	string	Always "Sum"
DisplayUnits	string

In order to write data, you need to call <counter>.Increment(value). The Increment that you receive is always currentValue - previousValue.

PollingCounter

A PollingCounter is very much like a standard EventCounter, but instead of the metric being written to it, a function is invoked which retrieves the value from your source of choice. An PollingCounter instance tracks the following about the metrics that it's recorded:

Name	Type	Notes
Name	string
DisplayName	string
Mean	double	The average of all values recorded
StandardDeviation	double
Count	int	How many metric entries were recorded in this iteration
Min	double
Max	double
IntervalSec	float
CounterType	string	Always "Mean"
Metadata	string	Any associated metadata for this specific counter
DisplayUnits	string
Series	string	Format is: $"IntervalSec="

IncrementingPollingCounter

A IncrementingPollingCounter is very much like a standard IncrementingEventCounter, but instead of the metric being written to it, a function is invoked which retrieves the value from your source of choice. An IncrementingPollingCounter instance tracks the following about the metrics that it's recorded:

Name	Type	Notes
Name	string
DisplayName	string
DisplayRateTimeScale	string	The unit of measure that the metric should be shown in, eg: per-second
Increment	double	The value of the this is: currentValue - previousValue
IntervalSec	float
Metadata	string
Series	string	Format is: $"IntervalSec="
CounterType	string	Always "Sum"
DisplayUnits	string

Under the hood

Now that we've taken a look at how we construct the EvenSource so that we can create our application level metrics, we should also take a look at what happens under the hood so we can begin to complete the circle. Once you start creating any of the listed DiagnosticCounters in your application - the counter calls a method which ensures that the counter gets added to a CounterGroup associated with the specified EventSource. When a DiagnosticCounter is disposed, then it is removed from the CounterGroup and no longer tracked.

The CounterGroup is responsible for maintaining a thread that polls the DiagnosticCounters on the specified interval and updates their values. The thread isn't created until such time as an application calls EnableEvents(eventSource, EventLevel.LogAlways, EventKeywords.All, new Dictionary<string, string>{{"EventCounterIntervalSec", "1"}}); on an EventSource. Lastly, when the value of each DiagnosticCounter is updated, an event is raised against the EventSource that was passed to the counter which means that we can listen to this in the same way that we listen to other events on EventSource's - eg: PerfView/EventListener.

The whole EventSource system is very lightweight and designed for scalability in systems that generate millions of events - so we should not be too concerned about the performance of this. Naturally, the more that you listen to, the more impact this will have. I think it's safe to say, the code that we write in the listeners will likely be the slowest part of this system.

Listening for event counters

Lastly, in order to complete our circle, we need to be able to listen to the counters that we've created in our applications. There are two common approaches that we can use: the CLI tool dotnet-counters or from within our applications using an EventListener.

Consuming EventCounters using dotnet-counters

As part of the diagnostic improvements in .Net Core 3, the .Net team introduced a new diagnostics tool called dotnet-counters. This is a stand-alone tool that can be installed using the following command:

dotnet tool install dotnet-counters --global

Or updated to the latest version if you already have it installed:

dotnet tool update dotnet-counters --global

After the tool has been installed, you can see the processes that are eligible for attaching to, using:

dotnet-counters ps
    10416 dotnet     C:\Program Files\dotnet\dotnet.exe
    20660 dotnet     C:\Program Files\dotnet\dotnet.exe
    21172 dotnet     C:\Program Files\dotnet\dotnet.exe

Once you know the process that you want to attach to, you can start monitoring with the following command:

dotnet-counters monitor -p 21172

If you are interested in specific EventSources, then you can supply a space separated list of EventSources like:

dotnet-counters monitor -p 21172 System.Runtime MyEventSource

By default, when you ask to monitor an EventSource, it will capture and display all the counters for you. If no EventSources are specified then a default list is used, including: System.Runtime. If you only wish to track a few counters from each EventSource, then you specify them in square brackets directly after the EventSource name:

dotnet-counters monitor -p 21172 System.Runtime[cpu-usage] MyEventSource[test]

All of the monitor commands will output something similar to the following:

Press p to pause, r to resume, q to quit.
    Status: Running
 
[System.Runtime]
    CPU Usage (%)                                      0
[MyEventSource]
    test                                             335

Lastly, should you wish to control the rate that the counters are refreshed, supply the --refresh-interval parameter:

dotnet-counters monitor -p 21172 --refresh-interval 5 System.Runtime[cpu-usage] MyEventSource[test]

Consuming EventCounters within our applications

In order to enable tracing from within a .Net application you need three core parts:

Class inheriting from EventListener
Detecting of EventSource's
Processing of Events

Creating our EventListener

For our new EventListener, I will create a simple background service as follows:

internal sealed class MetricsCollectionService : EventListener, IHostedService
{
    public Task StartAsync(CancellationToken cancellationToken)
    {
        return Task.CompletedTask;
    }
 
    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}

This will live for the lifetime of the application and host the task that will detect lazily initiated EventSources, such as the OpenMessage one I showed earlier in this article.

Detecting EventSources

In order to detect the lazily initiated EventSources, we need to periodically call the method EventSource.GetSources() which lists all of the currently available sources. we can do this from a simple task that lives against the service:

internal sealed class MetricsCollectionService : EventListener, IHostedService
{
    private List<string> RegisteredEventSources = new List<string>();
    private Task _newDataSourceTask;
 
    public Task StartAsync(CancellationToken cancellationToken)
    {
        _newDataSourceTask = Task.Run(async () =>
        {
            while (true)
            {
                GetNewSources();
                await Task.Delay(1000);
            }
        });
 
        return Task.CompletedTask;
    }
 
    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
 
    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        if (!RegisteredEventSources.Contains(eventSource.Name))
        {
            RegisteredEventSources.Add(eventSource.Name);
            EnableEvents(eventSource, EventLevel.LogAlways, EventKeywords.All, new Dictionary<string, string>
            {
                {"EventCounterIntervalSec", "1"}
            });
        }
    }
 
    private void GetNewSources()
    {
        foreach (var eventSource in EventSource.GetSources())
            OnEventSourceCreated(eventSource);
    }
}

We've got a list of the EventSources that we have already asked to be enabled so that we don't continually ask them to enable themselves. This helps guard against any slightly mis-constructed EventSources, though not strictly necessary.

Processing Events

The last bit for us to do is to override the OnEventWritten:

protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
    if (eventData.EventName != "EventCounters"
            || eventData.Payload.Count <= 0
            || !(eventData.Payload[0] is IDictionary<string, object> data)
            || !data.TryGetValue("CounterType", out var counterType)
            || !data.TryGetValue("Name", out var name))
        return;
 
    var metricType = counterType.ToString();
    float metricValue = 0;
 
    if ("Sum".Equals(metricType) && data.TryGetValue("Increment", out var increment))
    {
        metricValue = Convert.ToSingle(increment);
    }
    else if ("Mean".Equals(metricType) && data.TryGetValue("Mean", out var mean))
    {
        metricValue = Convert.ToSingle(mean);
    }
 
    // do something with your metric here...
}

This method gets called for each EventSource that you have asked to receive the data from. It will be up to you to decide your own filtering policy. For each EventWrittenEventArgs that you receive, you need to double check that you have received an EventCounter before proceeding. Next, you need to check the payload that you received is indeed a IDictionary<string, object>, so that you can process the contents in a quick and efficient manner. Although in the implementations, there is a strongly typed class for the payload for each of the built in counters, it is internal so we are unable to consume it here. The last piece of the puzzle is for you to process the metric however you wish, ie: sending to DataDog.

Putting all of the above code together, we get something like the following:

internal sealed class MetricsCollectionService : EventListener, IHostedService
{
    private List<string> RegisteredEventSources = new List<string>();
    private Task _newDataSourceTask;
 
    public Task StartAsync(CancellationToken cancellationToken)
    {
        _newDataSourceTask = Task.Run(async () =>
        {
            while (true)
            {
                GetNewSources();
                await Task.Delay(1000);
            }
        });
 
        return Task.CompletedTask;
    }
 
    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
 
    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        if (!RegisteredEventSources.Contains(eventSource.Name))
        {
            RegisteredEventSources.Add(eventSource.Name);
            EnableEvents(eventSource, EventLevel.LogAlways, EventKeywords.All, new Dictionary<string, string>
            {
                {"EventCounterIntervalSec", "1"}
            });
        }
    }
 
    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        if (eventData.EventName != "EventCounters"
                || eventData.Payload.Count <= 0
                || !(eventData.Payload[0] is IDictionary<string, object> data)
                || !data.TryGetValue("CounterType", out var counterType)
                || !data.TryGetValue("Name", out var name))
            return;
 
        var metricType = counterType.ToString();
        float metricValue = 0;
 
        if ("Sum".Equals(metricType) && data.TryGetValue("Increment", out var increment))
        {
            metricValue = Convert.ToSingle(increment);
        }
        else if ("Mean".Equals(metricType) && data.TryGetValue("Mean", out var mean))
        {
            metricValue = Convert.ToSingle(mean);
        }
 
        // do something with your metric here...
    }
 
    private void GetNewSources()
    {
        foreach (var eventSource in EventSource.GetSources())
            OnEventSourceCreated(eventSource);
    }
}

Hopefully at this point, you have enough information on how to use the built in counters and creating your own metrics.

Using dotnet-counters with Docker

Now that we know how to create and listen for EventCounters, let's look at how to leverage the dotnet-counters tool with a running docker image.

Creating our diagnostics image

In order to connect to a docker image, we will create a diagnostics image which will host the same .Net SDK version as our application and will have the dotnet-counters tool pre-installed:

FROM mcr.microsoft.com/dotnet/core/sdk:3.1
RUN mkdir /root/.dotnet/tools
ENV PATH="/root/.dotnet/tools:${PATH}"
RUN  dotnet tool install dotnet-counters --global
WORKDIR /diagnostics
ENTRYPOINT [ "/bin/bash" ]

A common mistake when creating docker images that contain .Net tools which are installed globally is not remembering to add the tool path, in this case /root/.dotnet/tools/ to the PATH so that it can be globally executed. Luckily, the .Net CLI will remind you in the build logs should you forget to do this.

Note: You can see the other tools that are available here.

Now that we have our docker image ready, we can build with the following command:

docker build -f diagnostics.Dockerfile -t dotnetdiag:3.1 .

Setting up the host image

For the purposes of this article, we will setup our application using a brand new application within a dockerfile, created by dotnet new:

FROM mcr.microsoft.com/dotnet/core/sdk:3.1
WORKDIR /app
EXPOSE 5000
EXPOSE 5001
RUN dotnet new webapp -n BlogApp
WORKDIR /app/BlogApp
ENTRYPOINT dotnet run -c Release

And we will build our application with the following command line:

docker build -f app.Dockerfile --name app -t dotnetapp:latest .

Once you have you're application built we are ready to start our docker image with debugging enabled.

Connecting from the diagnostics image to the host image

Normally we would start our applications with a command line similar to this:

docker run --rm --name app dotnetapp:latest

However, in order to be able to connect to the running application we need to mount a volume to the temporary directory on the application container. We can do this by appending -v dotnetdiag:/tmp, which instructs docker to mount a named volume dotnetdiag to the path /tmp. Docker will create the named volume during startup if it does not exist.

We mount the volume because as the .Net runtime starts up, it places a load of temporary files into the /tmp directory such as the following:

root@379211a5012a:/# ls /tmp
CoreFxPipe_root.b5he0_wwfcD_lH7g471Brpw4X   VBCSCompiler
jiksomfd.ri0                                NuGetScratch
hn2K8eq8bHUcTVSgvuckPlSK9tw9_ORiMDm_Vn4ylfI system-commandline-sentinel-files

Note the inclusion of the file beginning with CoreFxPipe_root, which is the EventPipe that we will connect to.

Once the application is running, we are now able to start connecting to our application. Normally we would run the following command line to start the diagnostics image: docker run --rm -it --pid=container:app --net=container:app -v dotnetdiag:/tmp --cap-add ALL --privileged dotnetdiag:3.1. Before we execute this command, we need to modify it by add arguments for:

Mounting to the same volume as the running application
Be able to inspect the process list of the running application,
Be able to share the same networking as the running application,
Elevate execution for the new container

Without completing the steps listed above we will be unable to connect to the running application. For mounting the volume we can use the exact same argument as before (-v dotnetdiag:/tmp).

In order to get the process id, we need to join the same process namespace through the use of the --pid argument. The --pid offers two modes, container or host. For this article, we will connect to a specific container by name, though you can also connect to the container by id as well.

Like the process argument, we also need to join the same networking space as the running container. So we will use --net which can also be run in multiple modes. For this article, we will connect to the application via the container name.

Lastly, by default, Docker containers restrict a lot of what you can do with running processes, like run docker in docker. So we need to tell docker to run in privileged mode and what capabilities we require to have from our diagnostics container. For this we will use the --cap-add and the --privileged arguments. Click here to read more about runtime privilege and docker capabilities.

After putting it all together, here is the full command line that we will run:

docker run --rm -it --pid=container:app --net=container:app -v dotnetdiag:/tmp --cap-add ALL --privileged dotnetdiag:3.1

Now you should have an empty command line to run, so if we execute dotnet-counters ps you should see something similar to the following:

root@9663cbb4e1fe:/diagnostics# dotnet-counters ps
       103 BlogApp    /app/BlogApp/bin/Release/netcoreapp3.1/BlogApp
         6 dotnet     /usr/share/dotnet/dotnet
        42 dotnet     /usr/share/dotnet/dotnet
        61 dotnet     /usr/share/dotnet/dotnet
       247 dotnet-counters /root/.dotnet/tools/dotnet-counters

Assuming that your application is running under process id 103 then we would execute the following command to view the counters:

root@9663cbb4e1fe:/diagnostics# dotnet-counters monitor -p 103 System.Runtime Microsoft.AspNetCore.Hosting

Recap

In order to diagnose a running docker image from another docker image, you need to:

Mount the /tmp on the application image prior to starting the application
Create a diagnostic image with your diagnostics tools
Run your diagnostics image with the following arguments:
- -v for mounting to the same volume as the application image
- --pid for joining the same process space
- --net for joining the same network
- --privileged for requesting additional permissions to cross container boundaries, and
- --cap-add ALL for adding the ability to list processes etc.

An alternative approach: Embedding diagnostics tools in the image

The approach above requires elevated permissions and a separate diagnostics container. An alternative is to embed the tooling directly into the runtime image so that we can extract the counter/memory information as required without the elevated permissions.

Let's assume that we are starting with the following dockerfile:

# Publish the application using the SDK
FROM mcr.microsoft.com/dotnet/core/sdk:3.1-alpine AS build
WORKDIR /app
RUN dotnet new webapp -n BlogApp
RUN dotnet publish /app/BlogApp/BlogApp.csproj -c Release -o /out /p:GenerateDocumentationFile=false
 
# Build the smaller runtime image
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-alpine
WORKDIR /app
COPY --from=build /out ./
EXPOSE 5000
ENTRYPOINT ["dotnet", "BlogApp.dll"]

Here we use a docker multi-stage build to publish our application (which is also created inline for the purposes of this article). Once the code has been published, we can then make the a runtime image which has a lot less dependencies, thus a smaller image size, to host the published version of the application.

Note: If you don't use the same OS, like Alpine, on both steps, then you should specify the -r flag with the runtime identifier for the runtime image.

Installing the .Net tools

In order to embed the tooling inside of the runtime image, we first need to adapt our build image:

# Publish the application using the SDK
FROM mcr.microsoft.com/dotnet/core/sdk:3.1-alpine AS build
WORKDIR /app
RUN dotnet new webapp -n BlogApp
RUN dotnet publish /app/BlogApp/BlogApp.csproj -c Release -o /out /p:GenerateDocumentationFile=false
# NEW CODE
RUN dotnet tool install dotnet-dump --tool-path /tools
RUN dotnet tool install dotnet-counters --tool-path /tools
RUN dotnet tool install dotnet-trace --tool-path /tools
# END OF NEW CODE

Here we leverage the dotnet tools ability to restore tooling to a specific directory, in this case /tools. Once the tools have been installed, we can copy them into the runtime image:

# Build the smaller runtime image
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-alpine
WORKDIR /app
COPY --from=build /out ./
EXPOSE 5000
# NEW CODE
COPY --from=build /tools /tools
ENV PATH="/tools:${PATH}"
# END OF NEW CODE
ENTRYPOINT ["dotnet", "BlogApp.dll"]

Accessing the tools at runtime

In order to access these tools at runtime, we need to be able to access the container at runtime. An example of this is being able to SSH into the running EC2 instance on AWS. Assuming that we have access, we can run the following command to get the running containers:

docker ps

Which results in output similar to the following:

CONTAINER ID  IMAGE             COMMAND                  CREATED        STATUS                    PORTS                NAMES
fac2377f3e87  myContainerImage  "./usr/src/app/init.…"   30 hours ago   Up 55 seconds (healthy)   0.0.0.0:80->80/tcp   myContainerImage

From here, we can use the docker exec command to launch a shell in the new container, using the container ID from above:

docker exec -it -w /tools <ID> /bin/sh
#Example:
docker exec -it -w /tools fac2377f3e87 /bin/sh

-it tells docker that we want the shell to be interactive and to keep the shell open for us even when there is no immediate input, ie: we can type into it and get a response. -w means start in the working directory /tools. Next, replace <ID> with the container ID from the selection above. Finally, we pass in the command that we want to execute in the shell - which we open a shell so that we can run different commands.

Now you should be able to run dotnet-counters, dotnet-dump & dotnet-trace as normal. If you need to copy any files from the container then you need to run the following from the host machine:

docker cp <ID>:<path-to-file-in-container> <copy-to-path-on-host>
#Example:
docker cp fac2377f3e87:/tools/output/trace.nettrace ./output/trace.nettrace

The docker cp command allows us to copy a file from/to the running container (specified by <ID>). The only other thing that you need is the path of the file that you wish to copy from the container, and the destination path on the host machine.

Now you'll have the diagnostic tools embedded within your runtime images, at the correct version. Naturally, the more tools that you install, the larger the final size of the image will be. It does also take a little bit of prep work, but this can pay off massively for unexpected memory/cpu issues.

Listening to inbound HTTP requests

Beyond the built-in counters, we can also use DiagnosticListener to monitor inbound HTTP requests. We could use middleware as most approaches do, but that approach is highly dependent on the middleware order and duration, so instead we will hook directly into the ASP.NET Core diagnostics pipeline. This section re-uses the infrastructure from the outbound HTTP requests section below, so refer there if something is missing here. We will need to implement the following components:

A new DiagnosticListener
A observer that looks at incoming requests
A observer that looks at the response
A metric builder that builds our diagnostic counters

Implementing the DiagnosticListener

In order to hook into the infrastructure built in the outbound HTTP requests section below, we need a new implementation of DiagnosticListenerBase that listens on the Microsoft.AspNetCore DiagnosticSource:

internal sealed class InboundHttpRequestDiagnosticListener : DiagnosticListenerBase
{
    private readonly List<IInboundHttpObserver> _observers;
    private readonly string _name = "Microsoft.AspNetCore";
 
    public InboundHttpRequestDiagnosticListener(IEnumerable<IInboundHttpObserver> observers)
    {
        _observers = observers.ToList();
    }
 
    public override void TryObserve(DiagnosticListener diagnosticListener)
    {
        if (diagnosticListener is null || !diagnosticListener.Name.Equals(_name, StringComparison.OrdinalIgnoreCase))
            return;
 
        foreach (var observer in _observers)
            Subscribe(diagnosticListener, observer);
    }
}

The intention here is that we only subscribe specific observers when we encounter a DiagnosticListener that’s named Microsoft.AspNetCore. This listener has two specific events that we need to listen for:

Microsoft.AspNetCore.Hosting.BeginRequest - Contains the following properties: httpContext, timestamp
Microsoft.AspNetCore.Hosting.EndRequest - Contains the following properties: httpContext, timestamp

We will bind all of our observers into our IoC container again so that we can take advantage of injecting dependencies easily should we want/need to. The code is near identical to the outbound version otherwise.

Implementing the observers

The approach we are going to take is largely the same as the outbound HTTP requests. The DiagnosticListener that we subscribe to is different, as are the payloads, but we get a near identical set of information that we can use to generate out metrics. We use a marker interface for all of our inbound observers, which is declared as follows:

internal interface IInboundHttpObserver : IObserver<KeyValuePair<string, object>>
{
}

Implementing the request observer

The purpose of the InboundHttpRequestObserver is to extract the timestamp property that’s contained in the Microsoft.AspNetCore.Hosting.BeginRequest event, which indicates the ticks that the request started, and attach this as a property in the HttpContext so that we can access it later on.

internal sealed class InboundHttpRequestObserver : SimpleDiagnosticListenerObserver, IInboundHttpObserver
{
    public override void OnNext(KeyValuePair<string, object> value)
    {
        if (value.Key == "Microsoft.AspNetCore.Hosting.BeginRequest")
        {
            var data = GetValueAs<TypedData>(value);
            if (data?.httpContext?.Items is {})
            {
                data.httpContext.Items["RequestTimestamp"] = data.timestamp;
            }
        }
    }
 
    private class TypedData
    {
        public HttpContext? httpContext;
        public long timestamp;
    }
}

Like our outbound implementation, I've generated a typed class so that we can access the data within the events payload as these are internal classes. I’ve nested a class inside of the observer to help with this, containing only the properties that I need.

Implementing the response observer

The purpose of the InboundHttpResponseObserver is to extract the timestamp property that’s contained in the Microsoft.AspNetCore.Hosting.EndRequest event, which indicates the ticks that the request finished, and calculate the duration using the request timestamp that we previously stored in the HttpContext properties.

internal sealed class InboundHttpResponseObserver : SimpleDiagnosticListenerObserver, IInboundHttpObserver
{
    private readonly IInboundHttpMetricBuilder _metricBuilder;
 
    public InboundHttpResponseObserver(IInboundHttpMetricBuilder metricBuilder)
    {
        _metricBuilder = metricBuilder;
    }
 
    public override void OnNext(KeyValuePair<string, object> value)
    {
        if (value.Key == "Microsoft.AspNetCore.Hosting.EndRequest")
        {
            var data = GetValueAs<TypedData>(value);
            object? requestTimestamp = null;
            if (data.httpContext?.Items?.TryGetValue("RequestTimestamp", out requestTimestamp) == true)
            {
                if (requestTimestamp is {} && long.TryParse(requestTimestamp?.ToString(), out var startTimestamp))
                {
                    var response = data.httpContext.Response;
                    var request = data.httpContext.Request;
                    // For all HTTP requests we should:
                    //    - Track the success (<400 status code response) or failure of the API call
                    //    - Capture the latency of the request
                    var resultCounter = (int)response.StatusCode < 400 ? _metricBuilder.GetSuccessCounter(request, response) : _metricBuilder.GetErrorCounter(request, response);
                    resultCounter?.Increment();
                    _metricBuilder.GetLatencyCounter(request, response)?.WriteMetric(GetDuration(startTimestamp, data.timestamp).TotalMilliseconds);
                }
            }
        }
    }
 
    private class TypedData
    {
        public HttpContext? httpContext;
        public long timestamp;
    }
}

As mentioned in a previous section, I've generated a typed class so that we can access the data within the events payload. I’ve nested a class inside of the observer to help with this, containing only the properties that I need. Now that we have all of the data we need to generate some metrics, we can use the injected IInboundHttpMetricBuilder to create the metrics that we want to track dynamically.

Creating metrics from the context of the request

In our services, there are a few bits of information that I want to capture about the context of the request:

Whether the request was successful or not (based on the HTTP Status code)
The duration of the request, in milliseconds

With this information, we want to add metadata to the DiagnosticCounters that we generate so that we can use it as dimensions in our monitoring applications like DataDog/Prometheus. We want to track the following properties:

HTTP method: GET/POST/PUT/PATCH/DELETE etc
HTTP version: 1.0/1.1/2.0 etc
HTTP scheme: HTTP/HTTPS
HTTP request type: outbound/inbound
HTTP status code: 200/201/202/204/400 etc
Request Path: /search
Host: www.google.com

With this information, we should have more than enough to filter out specific flows easily, whilst being able to aggregate the results where needed. Each one of the properties is added to each one of the diagnostic counters that we generate:

Success Counter
Error Counter
Latency Counter

To allow us to override the implementation later on, we can use the following interface:

public interface IInboundHttpMetricBuilder
{
    IncrementingEventCounter? GetSuccessCounter(HttpRequest request, HttpResponse response);
    IncrementingEventCounter? GetErrorCounter(HttpRequest request, HttpResponse response);
    EventCounter? GetLatencyCounter(HttpRequest request, HttpResponse response);
}

Note: For a summary of the different types of event counters, please see the "Types of DiagnosticCounters" section earlier in this article.

In order to generate the same tags that we want, we can re-use most of the same code from the outbound section, renaming anything that says outbound to inbound:

internal sealed class DefaultInboundHttpMetricBuilder : IInboundHttpMetricBuilder
{
    private readonly ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> _successCounters = new ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter>(new ListOfTupleEqualityComparer());
    private readonly ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> _errorCounters = new ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter>(new ListOfTupleEqualityComparer());
    private readonly ConcurrentDictionary<List<(string key, string value)>, EventCounter> _latencyCounters = new ConcurrentDictionary<List<(string key, string value)>, EventCounter>(new ListOfTupleEqualityComparer());
 
    public IncrementingEventCounter GetSuccessCounter(HttpRequest request, HttpResponse response) => GetCoreHttpRequestCounter(_successCounters, request, response);
 
    public IncrementingEventCounter GetErrorCounter(HttpRequest request, HttpResponse response) => GetCoreHttpRequestCounter(_errorCounters, request, response);
 
    public EventCounter GetLatencyCounter(HttpRequest request, HttpResponse response)
    {
        return _latencyCounters.GetOrAdd(GetCoreTags(request, response), key =>
        {
            var counter = new EventCounter("http-request-latency", CheckoutEventSource.Instance)
            {
                DisplayName = "HTTP Request Latency",
                DisplayUnits = "ms"
            };
            foreach (var dimension in key)
                counter.AddMetadata(dimension.key, dimension.value);
            CheckoutEventSource.Instance.AddDiagnosticCounter(counter);
            return counter;
        });
    }
 
    private IncrementingEventCounter GetCoreHttpRequestCounter(ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> collection, HttpRequest request, HttpResponse response)
    {
        return collection.GetOrAdd(GetCoreTags(request, response), key =>
        {
            Debug.WriteLine("CREATED NEW COUNTER: " + string.Join(",", key.Select(x => $"{x.key}:{x.value}")));
 
            var counter = new IncrementingEventCounter("http-request", CheckoutEventSource.Instance)
            {
                DisplayName = "HTTP Request Count",
                DisplayUnits = "requests"
            };
            foreach (var dimension in key)
                counter.AddMetadata(dimension.key, dimension.value);
            CheckoutEventSource.Instance.AddDiagnosticCounter(counter);
            return counter;
        });
    }
 
    private List<(string key, string value)> GetCoreTags(HttpRequest request, HttpResponse response)
    {
        var path = request.Path.Value;
 
        if (string.IsNullOrWhiteSpace(path))
            path = "/";
 
        if (path.Length > 1)
        {
            var initialPartIndex = path.IndexOf('/', 1);
            if (initialPartIndex > 1)
                path = path.Substring(0, initialPartIndex);
            else
            {
                var queryIndex = path.IndexOf('?', 1);
                if (queryIndex >= 0)
                    path = path.Substring(0, queryIndex);
            }
        }
 
        var tags = new List<(string, string)>
        {
            ("http-method", request.Method),
            ("http-scheme", request.Scheme),
            ("http-request-type", "inbound"),
            ("http-status-code", response.StatusCode.ToString()),
            ("host", request.Host.Host), // host without the port value
            ("request-path", path)
        };
 
        if (request.Protocol.StartsWith("HTTP/"))
            tags.Add(("http-version", request.Protocol.Substring(5)));
 
        return tags;
    }
 
    private class ListOfTupleEqualityComparer : EqualityComparer<List<(string, string)>>
    {
        public override bool Equals(List<(string, string)>? left, List<(string, string)>? right)
        {
            if (left is null || right is null)
                return ReferenceEquals(left, right);
 
            if (left.Count != right.Count)
                return false;
 
            if (left.Count == 0)
                return true; // Both are 0
 
            using var iterator2 = right.GetEnumerator();
            foreach (var element in left)
            {
                // second is shorter than first
                if (!iterator2.MoveNext())
                {
                    return false;
                }
                if (!(element.Item1.Equals(iterator2.Current.Item1) && element.Item2.Equals(iterator2.Current.Item2)))
                {
                    return false;
                }
            }
            // If we can get to the next element, first was shorter than second.
            // Otherwise, the sequences are equal.
            return !iterator2.MoveNext();
        }
 
        public override int GetHashCode(List<(string, string)> obj)
        {
            var code = 17;
            foreach (var element in obj)
                code = HashCode.Combine(code, element.Item1.GetHashCode(), element.Item2.GetHashCode());
 
            return code;
        }
    }
}

Although I've copied the full code here for completeness, the only portion that's really changed is the GetCoreTags method. We needed to change this because the request/response classes that are used on the inbound request flow are different to what we used on the outbound flow. The logic, however, is largely unchanged.

Hopefully, once everything has been bound to your IoC container, you now have all the bits that you would need to build this out in your own applications.

Listening to outbound HTTP requests

Now let's look at how we can capture all outbound HTTP requests automatically as they occur using DiagnosticListener.

Our implementation is going to use a number of technologies combined to get the information that we require about the web request. Here are the steps that we need to complete:

Create a service that hooks onto DiagnosticListeners as they get created
Create a observer to listen for the start of an outbound request
Create a observer to listen for the end of an outbound request
Create metrics from the context of the request/response

The classes that I've added here are designed to give you the most flexiblity around how you extend your applications in future. Another aim is to give the classes a single purpose to aid with testability. If you do not need this level of extensiblity or testability, it should be relatively easy to merge some of the classes together. As this is already a lengthy article, I've not included the tests here.

Creating the diagnostics hosted service

Our DiagnosticsHostedService will help us manage the lifetime of our observers. I've included here in the article for completeness, although this is an an optional step so log as you register your new DiagnosticListener observer via DiagnosticListener.AllListeners then you should be fine.

A DiagnosticListener allows us to listen for events that are published in our application, either by a third party or ourselves, for the purposes of diagnostics. The events are sent from a DiagnosticSource that sends us a rich payload that's designed for consumption within the current process. They are multi-cast in nature, meaning that multiple listeners can listen to the same event without any issues. For our use case, we will listen to a single DiagnosticSource with multiple observers, for testability.

The DiagnosticsHostedService

Now that we have a basic understanding of a DiagnosticListener we can use this in a simple hosted service that uses a special property called AllListeners. This property then exposes a Subscribe method on which we can add our first type of observer:

internal sealed class DiagnosticsHostedService : IHostedService
{
    private readonly Observer _observer;
    private IDisposable? _subscription;
 
    public DiagnosticsHostedService(Observer observer)
    {
        _observer = observer ?? throw new ArgumentNullException(nameof(observer));
    }
 
    public Task StartAsync(CancellationToken cancellationToken)
    {
        _subscription ??= DiagnosticListener.AllListeners.Subscribe(_observer);
        return Task.CompletedTask;
    }
 
    public Task StopAsync(CancellationToken cancellationToken)
    {
        _subscription?.Dispose();
        return Task.CompletedTask;
    }
}

The service above helps us with managing the life-cycle of the observer and keeps hold of the subscription to ensure that it doesn't accidentally get cleaned up. It doesn't matter at which point you call DiagnosticListener.AllListeners because when you subscribe, you will always get all previously registered DiagnosticSources and any future sources that will be created.

I'm a fan of being able to easily extend applications by adding a new entry to our DI containers. This can be extremely helpful when doing assembly scanning. To keep with this pattern, I've created a simple wrapper that facilitates this, whilst adding some safety guarantees for graceful shutdown scenarios.

internal class Observer : IObserver<DiagnosticListener>
{
    private readonly List<IDiagnosticListener> _listeners;
    private bool _complete = false;
 
    public Observer(IEnumerable<IDiagnosticListener> listeners)
    {
        _listeners = listeners?.ToList() ?? throw new ArgumentNullException(nameof(listeners));
    }
 
    public void OnCompleted()
    {
        lock (_listeners)
        {
            _complete = true;
        }
    }
 
    public void OnError(Exception error)
    {
    }
 
    public void OnNext(DiagnosticListener value)
    {
        lock (_listeners)
        {
            if (_complete)
                return;
 
            foreach(var listener in _listeners)
                listener.TryObserve(value);
        }
    }
}

Once we've hooked up the above services in DI, all that's left for us to do is implement the IDiagnosticListener interface and register some observers from within the implementation, binding the implementation of IDiagnosticListener and any observers into our DI container of choice along the way.

Creating the Observers

Implementing IDiagnosticListener

The same base classes are re-used for the inbound metrics covered earlier, so i've moved the common functionality to base classes that can be re-used for other purposes. First of all, we have the DiagnosticListenerBase:

public abstract class DiagnosticListenerBase : IDiagnosticListener
{
    private readonly List<IDisposable> _subscriptions = new List<IDisposable>();
    private bool _disposed = false;
 
    public abstract void TryObserve(DiagnosticListener diagnosticListener);
 
    protected void Subscribe(DiagnosticListener diagnosticListener, IObserver<KeyValuePair<string, object>> observer)
    {
        lock (_subscriptions)
        {
            if (_disposed)
                throw new InvalidOperationException("Cannot subscribe when the diagnostic listener has been disposed.");
 
            _subscriptions.Add(diagnosticListener.Subscribe(observer));
        }
    }
 
    public void Dispose()
    {
        lock (_subscriptions)
        {
            foreach(var subscription in _subscriptions)
                subscription.Dispose();
 
            _disposed = true;
        }
 
        OnDispose();
    }
 
    protected virtual void OnDispose()
    {
    }
}

This class is intended to make sure that we manage the subscriptions correctly, just like we did with the DiagnosticsHostedService. We need to make sure that we have some extensibility so I've added the following interface so that we can re-use it for both inbound and outbound observers:

public interface IDiagnosticListener : IDisposable
{
    void TryObserve(DiagnosticListener diagnosticListener);
}

DiagnosticListenerBase also abstractly implements the interfaces TryObserve method, which our OutboundHttpDiagnosticListener can override and subscribe the observers that we need:

internal sealed class OutboundHttpRequestDiagnosticListener : DiagnosticListenerBase
{
    private readonly List<IOutboundHttpObserver> _observers;
    private readonly string _name = "HttpHandlerDiagnosticListener";
 
    public OutboundHttpRequestDiagnosticListener(IEnumerable<IOutboundHttpObserver> observers)
    {
        _observers = observers.ToList();
    }
 
    public override void TryObserve(DiagnosticListener diagnosticListener)
    {
        if (diagnosticListener is null || !diagnosticListener.Name.Equals(_name, StringComparison.OrdinalIgnoreCase))
            return;
 
        foreach (var observer in _observers)
            Subscribe(diagnosticListener, observer);
    }
}

The intention here is that we only subscribe specific observers when we encounter a DiagnosticListener that's named HttpHandlerDiagnosticListener. This listener has two specific events that we need to listen for:

System.Net.Http.Request - Contains the following properties: Request, LoggingRequestId, TimeStamp
System.Net.Http.Response - Contains the following properties: Response, LoggingRequestId, ResponseTaskStatus, TimeStamp

Each of the observers that we create will have a marker interface attached to them called IOutboundHttpObserver so that we can plug them into our IoC container. It's simply defined as:

internal interface IOutboundHttpObserver : IObserver<KeyValuePair<string, object>>
{
}

SimpleDiagnosticListenerObserver

When you deal with DiagnosticListeners, we are dealing with the Observer pattern in C#, which means that we always need to implement the following methods: OnCompleted, OnError, OnNext. For our use case, we don't need the OnCompleted or OnError methods in any of our observers, so we can move this functionality into a base class with some additional helper methods: GetDuration and GetValueAs.

public abstract class SimpleDiagnosticListenerObserver : IObserver<KeyValuePair<string, object>>
{
    // Gets the conversion factor that's used to go from ticks to a real world time. Inspiration: https://github.com/aspnet/Extensions/blob/34204b6bc41de865f5310f5f237781a57a83976c/src/Shared/src/ValueStopwatch/ValueStopwatch.cs
    protected static readonly double TimestampToTicks = TimeSpan.TicksPerSecond / (double)Stopwatch.Frequency;
 
    public virtual void OnCompleted()
    {
    }
 
    public virtual void OnError(Exception error)
    {
    }
 
    public abstract void OnNext(KeyValuePair<string, object> value);
 
    protected static TimeSpan GetDuration(long startTimestampInTicks, long endTimestampInTicks)
    {
        var timestampDelta = endTimestampInTicks - startTimestampInTicks;
        var ticks = (long)(TimestampToTicks * timestampDelta);
        return new TimeSpan(ticks);
    }
 
    protected static T GetValueAs<T>(KeyValuePair<string, object> value)
        where T : class => Unsafe.As<T>(value.Value);
}

Each event that we receive in the is typed to be a KeyValuePair<string, object>. The key property will always represent the name of the event, while the value property will be the rich payload that's send by the DiagnosticSource.

The GetDuration method is inspired by the ValueStopwatch code that AspNetCore has internally. This allows us to calculate the wall-clock time duration of two ticks. Incidentally, this is the same calculation method that appears to be used in the logging of inbound HTTP requests from what I can tell so far.

The GetValueAs<T> method use some CLR magic to forcefully convert the type for us, ie: it does not perform type checking. We need this because the objects that come along with the events that we listen to are internal to the .Net code base, so we have to mimic the same type/properties and then cast to it so that we can access the information. A "safer" approach would be to use cached reflection calls, but to me, I understand that this might break in the future no matter what I do so I've opted for a more performant approach.

Creating the OutboundHttpRequestObserver

The purpose of the OutboundHttpRequestObserver is to extract the timestamp property that's contained in the System.Net.Http.Request event, which indicates the ticks that the request started, and attach this as a request property so that we can access it later on.

internal sealed class OutboundHttpRequestObserver : SimpleDiagnosticListenerObserver, IOutboundHttpObserver
{
    public override void OnNext(KeyValuePair<string, object> value)
    {
        if (value.Key == "System.Net.Http.Request")
        {
            var data = GetValueAs<TypedData>(value);
            if (data?.Request?.Properties is {})
            {
                data.Request.Properties["RequestTimestamp"] = data.Timestamp;
            }
        }
    }
 
    private class TypedData
    {
        public HttpRequestMessage? Request;
        public long Timestamp;
    }
}

As mentioned in the previous section, we generated a typed class so that we can access the data within the events payload. I've nested a class inside of the observer to help with this, containing only the properties that I need.

Creating the OutboundHttpResponseObserver

The purpose of the OutboundHttpResponseObserver is to extract the timestamp property that's contained in the System.Net.Http.Response event, which indicates the ticks that the request finished, and calculate the duration using the request timestamp that we previously stored in the request properties.

internal sealed class OutboundHttpResponseObserver : SimpleDiagnosticListenerObserver, IOutboundHttpObserver
{
    private readonly IOutboundHttpMetricBuilder _metricBuilder;
 
    public OutboundHttpResponseObserver(IOutboundHttpMetricBuilder metricBuilder)
    {
        _metricBuilder = metricBuilder;
    }
 
    public override void OnNext(KeyValuePair<string, object> value)
    {
        if (value.Key == "System.Net.Http.Response")
        {
            var data = GetValueAs<TypedData>(value);
            object? requestTimestamp = null;
            if (data?.Response?.RequestMessage?.Properties?.TryGetValue("RequestTimestamp", out requestTimestamp) == true)
            {
                if (long.TryParse(requestTimestamp?.ToString(), out var startTimestamp) == true)
                {
                    // For all HTTP requests we should:
                    //    - Track the success (<400 status code response) or failure of the API call
                    //    - Capture the latency of the request
                    var resultCounter = (int)data.Response.StatusCode < 400 ? _metricBuilder.GetSuccessCounter(data.Response.RequestMessage, data.Response) : _metricBuilder.GetErrorCounter(data.Response.RequestMessage, data.Response);
                    resultCounter?.Increment();
                    _metricBuilder.GetLatencyCounter(data.Response.RequestMessage, data.Response)?.WriteMetric(GetDuration(startTimestamp, data.TimeStamp).TotalMilliseconds);
                }
            }
        }
    }
 
    private class TypedData
    {
        public HttpResponseMessage? Response;
        public long TimeStamp;
    }
}

As mentioned in a previous section, we generated a typed class so that we can access the data within the events payload. I've nested a class inside of the observer to help with this, containing only the properties that I need. Now that we have all of the data we need to generate some metrics, we can use the injected IOutboundHttpMetricBuilder to create the metrics that we want to track dynamically.

Creating metrics from the context of the request

In our services, there are a few bits of information that I want to capture about the context of the request:

Whether the request was successful or not (based on the HTTP Status code)
The duration of the request, in milliseconds

HTTP method: GET/POST/PUT/PATCH/DELETE etc
HTTP version: 1.0/1.1/2.0 etc
HTTP scheme: HTTP/HTTPS
HTTP request type: outbound/inbound
HTTP status code: 200/201/202/204/400 etc
Request Path: /search
Host: www.google.com

Success Counter
Error Counter
Latency Counter

To allow us to override the implementation later on, we can use the following interface:

public interface IOutboundHttpMetricBuilder
{
    IncrementingEventCounter? GetSuccessCounter(HttpRequestMessage request, HttpResponseMessage response);
    IncrementingEventCounter? GetErrorCounter(HttpRequestMessage request, HttpResponseMessage response);
    EventCounter? GetLatencyCounter(HttpRequestMessage request, HttpResponseMessage response);
}

Note: For a summary of the different types of event counters, please see the "Types of DiagnosticCounters" section earlier in this article.

For the sake of brevity of this article, I'm not going to explain all of the below, rather the general concept. Here, the intention is to have a core set of dimensions (listed above) that are are also used to de-duplicate the number of counters that we create overall. Lastly, we have a custom comparer so that we can compare the values of the List that we generate for each type of metric, rather than relying on the default equality comparer. This helps us ensure that we have semantic rather than reference equality.

/// <remarks>
/// We don't want to add new event counters all the time to the system. So based on the tags, we maintain a list for success/errors/latency.
/// Because we are storing based on semantic equivalents, we need a custom comparer to ensure that we have uniqueness, this is guarenteed in two ways:
///     - Ensuring that hashcodes are generated using a semantic method, given that inputs are the always given in the same ordered way
///     - When we check for equality, assuming the hashcodes match, we each that the sequences are equal using a performant version of Enumerable.SequenceEquals (as this is going to be called ALOT!)
/// </remarks>
internal sealed class DefaultOutboundHttpMetricBuilder : IOutboundHttpMetricBuilder
{
    private readonly ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> _successCounters = new ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter>(new ListOfTupleEqualityComparer());
    private readonly ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> _errorCounters = new ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter>(new ListOfTupleEqualityComparer());
    private readonly ConcurrentDictionary<List<(string key, string value)>, EventCounter> _latencyCounters = new ConcurrentDictionary<List<(string key, string value)>, EventCounter>(new ListOfTupleEqualityComparer());
 
    public IncrementingEventCounter GetSuccessCounter(HttpRequestMessage request, HttpResponseMessage response) => GetCoreHttpRequestCounter(_successCounters, request, response);
 
    public IncrementingEventCounter GetErrorCounter(HttpRequestMessage request, HttpResponseMessage response) => GetCoreHttpRequestCounter(_errorCounters, request, response);
 
    public EventCounter GetLatencyCounter(HttpRequestMessage request, HttpResponseMessage response)
    {
        return _latencyCounters.GetOrAdd(GetCoreTags(request, response), key =>
        {
            var counter = new EventCounter("http-request-latency", MyDiagnosticsEventSource.Instance)
            {
                DisplayName = "HTTP Request Latency",
                DisplayUnits = "ms"
            };
            foreach (var dimension in key)
                counter.AddMetadata(dimension.key, dimension.value);
            MyDiagnosticsEventSource.Instance.AddDiagnosticCounter(counter);
            return counter;
        });
    }
 
    private IncrementingEventCounter GetCoreHttpRequestCounter(ConcurrentDictionary<List<(string key, string value)>, IncrementingEventCounter> collection, HttpRequestMessage request, HttpResponseMessage response)
    {
        return collection.GetOrAdd(GetCoreTags(request, response), key =>
        {
            Debug.WriteLine("CREATED NEW COUNTER: " + string.Join(",", key.Select(x => $"{x.key}:{x.value}")));
 
            var counter = new IncrementingEventCounter("http-request", MyDiagnosticsEventSource.Instance)
            {
                DisplayName = "HTTP Request Count",
                DisplayUnits = "requests"
            };
            foreach (var dimension in key)
                counter.AddMetadata(dimension.key, dimension.value);
            MyDiagnosticsEventSource.Instance.AddDiagnosticCounter(counter);
            return counter;
        });
    }
 
    private List<(string key, string value)> GetCoreTags(HttpRequestMessage request, HttpResponseMessage response)
    {
        var path = request.RequestUri.PathAndQuery;
 
        if (string.IsNullOrWhiteSpace(path))
            path = "/";
 
        if (path.Length > 1)
        {
            var initialPartIndex = path.IndexOf('/', 1);
            if (initialPartIndex > 1)
                path = path.Substring(0, initialPartIndex);
            else
            {
                var queryIndex = path.IndexOf('?', 1);
                if (queryIndex >= 0)
                    path = path.Substring(0, queryIndex);
            }
        }
 
        var tags = new List<(string, string)>
        {
            ("http-method", request.Method.ToString()),
            ("http-version", request.Version.ToString()),
            ("http-scheme", request.RequestUri.Scheme),
            ("http-request-type", "outbound"),
            ("http-status-code", ((int)response.StatusCode).ToString()),
            ("request-path", path)
        };
 
        if (request.RequestUri.IsAbsoluteUri)
            tags.Add(("host", request.RequestUri.Authority));
 
        return tags;
    }
 
    private class ListOfTupleEqualityComparer : EqualityComparer<List<(string, string)>>
    {
        public override bool Equals(List<(string, string)> left, List<(string, string)> right)
        {
            if (left.Count != right.Count)
                return false;
 
            if (left.Count == 0)
                return true; // Both are 0
 
            using var iterator2 = right.GetEnumerator();
            foreach (var element in left)
            {
                // second is shorter than first
                if (!iterator2.MoveNext())
                {
                    return false;
                }
                if (!(element.Item1.Equals(iterator2.Current.Item1) && element.Item2.Equals(iterator2.Current.Item2)))
                {
                    return false;
                }
            }
            // If we can get to the next element, first was shorter than second.
            // Otherwise, the sequences are equal.
            return !iterator2.MoveNext();
        }
 
        public override int GetHashCode(List<(string, string)> obj)
        {
            var code = 17;
            foreach (var element in obj)
                code = HashCode.Combine(code, element.Item1.GetHashCode(), element.Item2.GetHashCode());
 
            return code;
        }
    }
}

Naturally, if you use another method like the response body to figure out whether the request was successful or not then you will need to do additional work with the contents of the request. Hopefully, you now have all the bits that you would need to build this out in your own applications. The inbound HTTP requests section earlier in this article covers the same technique applied to incoming requests using the same base components.