In this post we will cover how to the the built in support for OpenTelemetry in modern .NET to instrument your distributed application for tracing and logging, how the OpenTelemetry Collector can be used to simplify instrumention, and how the OpenTelemetry Protocol is building a (brilliant) connected future.
We will now go further than logging and look at tracing. Tracing looks at the different units of work (spans) done during an operation (trace), how they are connected, and the timings of the different components. This is an important tool for investigating performance issues in distributed systems.
An example distributed trace timeline, across multiple components, viewed in Jaeger, one of many supported tools:
As well as looking at individual traces timings can be aggregated across the system to find the slowest areas, and identify anomalies.
For any modern dotnet system, distributed tracing is already built in to the default web client, server, and other operations.
You can see this with a basic example, by configuring your logging to display the correlation identifier. Many logging systems, such as Elasticsearch, display correlation by default. The identifiers can also be passed across messaging, such as Azure Message Bus.
Logging has always been a useful way to understand what is happening within a system and for troubleshooting. However, modern systems are very complex, with multiple layers and tiers, including third party systems.
Trying to understand what has happened when a back end service log reports an error, and then correlating that to various messages and front end actions that triggered it requires some kind of correlation identifier.
This problem has existed ever since we have had commercial websites, for over 20 years of my career, with various custom solutions.
Finally, in 2020, a standard was created for the format of the correlation identifier, and how to pass the values: W3C Trace Context. In a very short amount of time all major technology providers have implemented solutions.
The examples below show how this is already implemented in modern dotnet.
Application Insights is the application performance monitoring feature of Azure Monitor, and can be used to monitor deployed applications both in the cloud and on premises. App Insights supports W3C Trace Context standard headers to correlate tracing information across different components.
The features of App Insights, and Azure Monitor, are quite broad, whereas developers may want in some cases to filter down and focus on application-specific logging. Trace correlation is an important part of this, to get and end-to-end overview of operations.
To view logs, connect your App Insights instance to a Log Analytics workspace. Within the workspace, General > Logs will provide access to the query editor — you can either user one of the default Queries pop-up or write your own.
For example, to see all recent traces, and the correlation between them you can use a query like:
union AppTraces, AppDependencies, AppRequests
| where TimeGenerated > ago(30m)
and Properties.CategoryName !startswith "Microsoft"
| sort by TimeGenerated desc
| project TimeGenerated, Type, OperationId, Id, Properties.SpanId,
ParentId, ClientType, Message, Name, SeverityLevel, Properties,
Properties.CategoryName, OperationName, SessionId,
This example shows all the traces from one operation are linked to the same OperationId 029c3..., and the parent-child relationship between two tiers client (Browser) and server (PC) can also be determined:
Client (Browser) AppTraces have a ParentId 7d65e...
The client has a link from this parent to a child AppDependency with Id 73676...
On the server (PC) the dependency is recorded as the parent if the AppRequest Id 15c7e...
Additional traces on the server show the request as the ParentId (and there may be further parent-child links depending on the number of tiers).
There are many other types of records that can be queried, for example developers may often be interested in exceptions and traces that feature a particular keyword:
union AppExceptions, AppTraces
| where TimeGenerated > ago(30m)
| sort by TimeGenerated desc
| search "Password"
Earlier this year the W3C Trace Context Recommendation was finally published. A standard way of passing distributed trace correlation has been needed for a long time, and it is good to see there is finally a standard, and many vendors have already moved to adopt it.
The Recommendation defines what a distributed trace is:
A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed.
What constitutes a single logical operation depends on the system. In the example above it is a single button press on a website, whereas in a batch processing system it might be for each item processed, or in a complex UI it might consist of both a button press and a subsequent confirmation dialog.
The W3C Trace Context Recommendation describes how the correlation information — an identifier for the operation, and the parent-child relationships between components — is passed in service calls, but doesn't cover what to do with that information, apart from how to pass it to the next component.
This is a guide mostly how to use Trace Context for logging, although it also applies to metrics and other telemetry.
The first part of logging I have polished up Microsoft.Extensions.Logging is structured data support with a Syslog Structured Data package that contains a component which will render as syslog RFC 5424 structured data.
To use the Syslog StructuredData component, install the nuget package:
dotnet add package Syslog.StructuredData
You can then use the structured data via BeginScope() on an ILogger:
using (_logger.BeginScope(new StructuredData
Id = "origin", ["ip"] = ipAddress
For default logger providers, that don't understand structured data, the ToString() method on the StructuredData object will render out the data in RFC 5424 format. This format can still be easily parsed by log analyzers, although the surrounding context won't be a syslog message.
The new Microsoft.Extensions.Logging system has some improvements over the previous System.Diagnostics, with built in support for dependency injection and semantic logging (although I tend to think a singleton-type pattern, like TraceSource, is better than cluttering up every constructor with a logger).
Strangely, however, Microsoft did not include a basic file logger; they have App Insights, and even a file logger that works on Azure-only, but no basic logger. I guess they thought between Serilog/NLog/log4net/etc that there were enough third parties.
The only problem with these is that each of then is an entire logging systems, so you end up going through one framework (Microsoft.Extensions.Logging) to get to another framework (e.g. NLog) before you end up at an actual logger (e.g. a file logger). Why two frameworks?
With the old .NET Framework I never understood this either, which is why I wrote a range of TraceListeners, that each independently plugged into System.Diagnostics directly.
Structured logging (also sometimes called semantic logging) is a useful addition to the software development toolkit.
The latest release (2.2.0, and some 2.1.728) of Essential.Diagnostics adds structured tracing capabilities to the .NET Framework System.Diagnostics. It integrates seamlessly with existing tracing, including from the .NET Framework, and includes both producer-side extensions (to include in your application) and trace listener changes (to integrate with structured tracing systems).
This provides a trace listener implementation that forwards messages to a Seq logging server. For performance it forwards messages in batches (with the first message being sent immediately, so you know the system is up and running), with automatic back off and retry when there are interruptions to the network communication.
This component can be used with the new Microsoft.Extensions.Logging for .NET 4.5.1 and above, or with Sytem.Diagnostics.TraceSource for .NET 2.0 through 4.5.