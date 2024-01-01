Vetting Your Pager

At Sentry, we receive a million requests a minute to process and store crashes from all around the world. And it’s our Operations Team’s responsibility to ensure that everything goes right with these requests, but it’s also their responsibility to not burn themselves out in the process of dealing with everything that goes wrong.

We collect fifty thousand custom metrics inside of DataDog, but only alert on less than fifty of them. James Cunningham leads our internal observability initiative, creating and maintaining those alerts.

In this talk, he discusses the full lifecycle of an alert at Sentry, including:

How we collect such a wide variety of metrics efficiently

How we justify a metric’s degree of accuracy

Why a metric’s logical purpose is defined

How alerts evolve from metrics, articulating their existence

What happens when an engineer actually gets paged

Includes the most interesting questions from the closing Q&A.

Featuring