In this post we will cover how to the the built in support for OpenTelemetry in modern .NET to instrument your distributed application for tracing and logging, how the OpenTelemetry Collector can be used to simplify instrumention, and how the OpenTelemetry Protocol is building a (brilliant) connected future.
We will now go further than logging and look at tracing. Tracing looks at the different units of work (spans) done during an operation (trace), how they are connected, and the timings of the different components. This is an important tool for investigating performance issues in distributed systems.
An example distributed trace timeline, across multiple components, viewed in Jaeger, one of many supported tools:
As well as looking at individual traces timings can be aggregated across the system to find the slowest areas, and identify anomalies.
Earlier this year the W3C Trace Context Recommendation was finally published. A standard way of passing distributed trace correlation has been needed for a long time, and it is good to see there is finally a standard, and many vendors have already moved to adopt it.
The Recommendation defines what a distributed trace is:
A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. A distributed trace contains events that cross process, network and security boundaries. A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed.
What constitutes a single logical operation depends on the system. In the example above it is a single button press on a website, whereas in a batch processing system it might be for each item processed, or in a complex UI it might consist of both a button press and a subsequent confirmation dialog.
The W3C Trace Context Recommendation describes how the correlation information — an identifier for the operation, and the parent-child relationships between components — is passed in service calls, but doesn't cover what to do with that information, apart from how to pass it to the next component.
This is a guide mostly how to use Trace Context for logging, although it also applies to metrics and other telemetry.
I have always found it useful for applications to display their build version, and for libraries to have the build version in their properties. Relying on properties like the date (or file size) is always a bit risky.
.NET Core has embraced Semantic Versioning and at first glance appears to have a new way to specify version numbers.
It doesn't quite work to my full satisfaction, but luckily the older methods still work, so a basic GitVersion task in your build pipeline is pretty much all you need to get things working.