What is distributed tracing?

guide

Distributed tracing involves monitoring data requests as they move through a distributed system. In a modern microservices architecture, multiple small, independent components interact and exchange data via APIs to perform complex tasks. Distributed tracing allows developers to trace the path of a request across various microservices, providing the visibility needed to troubleshoot errors, fix bugs, and address performance issues.

What are the benefits of distributed tracing?

Development teams use distributed tracing to enhance observability and resolve performance issues that traditional monitoring tools cannot address.

Reduce mean time to resolution

Modern applications depend on numerous microservices to exchange data and handle service requests across distributed systems. Troubleshooting performance issues in such an architecture is much more challenging than in monolithic applications. In a microservices setup, the root cause of a problem might not be immediately apparent due to the complex interactions between various modules. Distributed tracing allows teams to monitor data as it traverses the intricate paths connecting multiple microservices and data storage units. By using distributed tracing tools, teams can track requests and visualize data propagation paths, enabling them to address performance issues and reduce service disruptions quickly.

Improve collaboration

Building a cloud application often involves several developers, each responsible for one or more microservices. The development process slows down when developers cannot trace the data exchanged between microservices. Distributed tracing systems provide telemetry data, such as logs and traces, for every service request a microservice makes. This transparency allows developers to collaborate more effectively, accurately address bugs, and resolve other issues discovered during testing and production.

Ship faster

Organizations that deploy distributed tracing platforms can streamline and accelerate the release of software applications to end users. By reviewing distributed traces, software teams gain insights that speed up development, reduce costs, understand user behaviors, and enhance market readiness.

How distributed tracing works

Imagine an online retail store with millions of customers. The store must track each customer’s actions, such as browsing products, adding items to their cart, making purchases, and other interactions. Ensuring smooth operation would be unmanageable with traditional tracing methods, but distributed tracing makes it possible.

Distributed tracing monitors a request (transaction) as it moves between multiple services within a microservices architecture. This allows you to identify the origin of the service request (the user-facing frontend application) and follow its journey through other services.

Consider an example of distributed tracing in a typical modern online retail application consisting of multiple microservices:

  1. A group of microservices manages the user interface, which displays the product catalog and handles user interactions.
  2. User data, such as account information and order history, is recorded in a database service.
  3. Several backend services handle tasks like inventory management, payment processing, and shipping.

In this environment, a distributed trace of the user’s request would start by recording information about the request’s status on the first frontend service, including the data the user inputs and the time it takes for the service to forward that data to other services.

The next step in the trace involves the backend services, which process tasks such as verifying inventory, processing payments, and initiating shipping. Finally, the backend services transfer the processed data to the database service, which stores it.

The challenges of distributed tracing

Distributed tracing has made it much easier for developers to diagnose, debug, and fix software issues. However, there are still some challenges that software teams need to be aware of when choosing tracing tools.

Manual instrumentation

Some tracing tools require developers to manually add code to generate the necessary traces. This process can introduce coding errors that affect production releases. Additionally, the lack of automation can make tracing more complicated, leading to delays and potentially inaccurate data collection.

Limited frontend coverage

If tracing tools are limited to backend analysis, developers might not get a complete picture of performance issues. In some cases, the distributed tracing system only starts collecting data when the first backend service receives the request. This means problems originating from frontend services during the user session can go undetected

Random sampling

Some tools don’t allow teams to prioritize which traces to collect, limiting observability to randomly sampled traces. With a limited sample size, organizations may need additional troubleshooting methods to capture major issues that the tracing tool might miss.

Distributed tracing with Sentry

With Sentry, it’s easy to track the complete, end-to-end path of a request from the originating user interaction across systems and services to pinpoint the specific function or API call causing the problem.

Automatic Instrumentation

Set up tracing in minutes. Just follow the instructions in our docs to start sending spans. Or watch the video to see how to set it up in less than 4 minutes.

Best-in-class frontend support

Sentry partners directly with framework authors to build industry-leading support for React and React-related frameworks (e.g., Javascript, React Native). You can easily trace frontend performance issues to poor-performing API calls & slow database queries across all your services.

Sampling controls

When you enable sampling in your tracing setup, you choose a percentage of collected transactions to send to Sentry. For example, if you had an endpoint that received 1000 requests per minute, a sampling rate of 0.25 would result in approximately 250 transactions (25%) being sent to Sentry each minute.

With a tracing solution that captures the entire request lifecycle across services, understanding a traffic spike now takes minutes, saving my team an average of 3 hours per issue.

Ben Ogle, Founder, Anvil Foundry

Additional distributed tracing resources

© 2024 • Sentry is a registered Trademark of Functional Software, Inc.